[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119727#comment-17119727 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r432580685 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, Review comment: It is not a blocking comment and I am fine with it. But generally speaking, adding too much nested if/else diverges the code path and causes the complexity for reading. One way to avoid duplicating is to wrap them up in helper functions. I understand column indexes and bloom filters already did that but if we keep adding features like this, the code will become less and less readable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119473#comment-17119473 ] ASF GitHub Bot commented on PARQUET-1229: - andersonm-1 commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r432393858 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestBloomEncryption.java ## @@ -0,0 +1,313 @@ +/* Review comment: Thank you, @gszadovszky , for the suggestion. We've added it to #782 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119463#comment-17119463 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky merged pull request #776: URL: https://github.com/apache/parquet-mr/pull/776 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118716#comment-17118716 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r431856097 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -1071,12 +1229,31 @@ public BloomFilterReader getBloomFilterDataReader(BlockMetaData block) { */ public BloomFilter readBloomFilter(ColumnChunkMetaData meta) throws IOException { long bloomFilterOffset = meta.getBloomFilterOffset(); + +if (0 == bloomFilterOffset) { // TODO Junjie - is there a better way to handle this? Review comment: yep, this works; also, makes sense to me - we should not proceed to read a bloom filter header (including seeking its offset), if there is no bloom filter in the file; this can be checked via the `bloomFilterOffset` variable - it can't be 0 if a bloom filter is present; and if no bloom filter, it can be 0 only. I'll remove the TODO comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118703#comment-17118703 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r431856097 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -1071,12 +1229,31 @@ public BloomFilterReader getBloomFilterDataReader(BlockMetaData block) { */ public BloomFilter readBloomFilter(ColumnChunkMetaData meta) throws IOException { long bloomFilterOffset = meta.getBloomFilterOffset(); + +if (0 == bloomFilterOffset) { // TODO Junjie - is there a better way to handle this? Review comment: yep, this works; also, makes sense to me - we should not proceed to read a bloom filter header, if there is no bloom filter in the file; this can be checked via the `bloomFilterOffset` variable - it can't be 0 if a bloom filter is present; and if no bloom filter, it can be 0 only. I'll remove the TODO comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118691#comment-17118691 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r431848468 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java ## @@ -165,10 +200,10 @@ protected static boolean positiveLongFitsInAnInt(long value) { return (value >= 0) && (value + Integer.MIN_VALUE <= Integer.MAX_VALUE); } - private final EncodingStats encodingStats; + protected EncodingStats encodingStats; Review comment: Sure, will change this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117737#comment-17117737 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-634639953 The current review round is addressed; the checks pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117539#comment-17117539 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430964257 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -1114,7 +1299,20 @@ public ColumnIndex readColumnIndex(ColumnChunkMetaData column) throws IOExceptio return null; } f.seek(ref.getOffset()); -return ParquetMetadataConverter.fromParquetColumnIndex(column.getPrimitiveType(), Util.readColumnIndex(f)); + +column.decryptIfNeededed(); Review comment: found a better place for it, inside the `ColumnChunkMetaData` object This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117537#comment-17117537 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430962619 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java ## @@ -188,7 +189,11 @@ private void checkDeltaByteArrayProblem(FileMetaData meta, Configuration conf, B // this is okay if not using DELTA_BYTE_ARRAY with the bug Set encodings = new HashSet(); for (ColumnChunkMetaData column : block.getColumns()) { -encodings.addAll(column.getEncodings()); +try { + encodings.addAll(column.getEncodings()); +} catch (KeyAccessDeniedException e) { Review comment: removed the suppression This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117455#comment-17117455 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430902400 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java ## @@ -165,10 +200,10 @@ protected static boolean positiveLongFitsInAnInt(long value) { return (value >= 0) && (value + Integer.MIN_VALUE <= Integer.MAX_VALUE); } - private final EncodingStats encodingStats; + protected EncodingStats encodingStats; Review comment: these fields are accessed by the `EncryptedColumnChunkMetaData` class, that sets their values after decrypting the column metadata. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117454#comment-17117454 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430901237 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestBloomEncryption.java ## @@ -0,0 +1,313 @@ +/* Review comment: this will be a part of https://github.com/apache/parquet-mr/pull/782 (it will also remove the TestBloomEncryption.java and TestColumnIndexEncryption.java files). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113915#comment-17113915 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-632617498 > > I know, this is not the whole feature yet but would like to ensure that at the end we will have proper test coverity. So, the question is do we have or planning to have tests to verify the working of semi-encrypted files (some columns and the footer are plaintext) with previous readers. > > We test this manually. We also have an automatic test that runs the current reader code on files that were previously created with other writers (such as parquet-cpp with encryption; these files are kept in parquet-testing repo and fetched from there). But I don't know how to automate running of previous readers on the files written by the current code - ? Good question. We should have the tooling to provide such tests but not sure if we already have it. I am fine with testing it manually for now just document it in details (e.g. adding to the git comment of the test PR). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113254#comment-17113254 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-632140012 > Now, we are in sync. I will try to keep it this way. ;) Appreciated! Addressing today's comments will keep me busy for a while :), I'll get back next week. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113247#comment-17113247 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428710799 ## File path: parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryption.java ## @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.parquet.hadoop; + +import static org.junit.Assert.assertEquals; +import static org.apache.parquet.hadoop.TestUtils.enforceEmptyDir; +import static org.apache.parquet.hadoop.metadata.CompressionCodecName.UNCOMPRESSED; +import static org.apache.parquet.schema.MessageTypeParser.parseMessageType; + +import java.nio.charset.StandardCharsets; +import java.util.HashMap; +import java.util.Random; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.junit.Rule; +import org.junit.Test; + +import org.apache.parquet.crypto.ColumnEncryptionProperties; +import org.apache.parquet.crypto.FileDecryptionProperties; +import org.apache.parquet.crypto.FileEncryptionProperties; +import org.apache.parquet.crypto.ParquetCipher; +import org.apache.parquet.crypto.StringKeyIdRetriever; +import org.apache.parquet.example.data.Group; +import org.apache.parquet.example.data.simple.SimpleGroupFactory; + + +import org.apache.parquet.hadoop.example.GroupReadSupport; +import org.apache.parquet.hadoop.example.GroupWriteSupport; +import org.apache.parquet.hadoop.metadata.ColumnPath; +import org.apache.parquet.io.api.Binary; +import org.apache.parquet.schema.MessageType; +import org.junit.rules.TemporaryFolder; + +public class TestEncryption { + + @Test + public void test() throws Exception { +Configuration conf = new Configuration(); +Path root = new Path("target/tests/TestEncryption/"); +enforceEmptyDir(conf, root); + +Random random = new Random(); +int numberOfEncryptionModes = 5; +FileEncryptionProperties[] encryptionPropertiesList = new FileEncryptionProperties[numberOfEncryptionModes]; +FileDecryptionProperties[] decryptionPropertiesList = new FileDecryptionProperties[numberOfEncryptionModes]; + +// #0 Unencrypted - make sure null encryption properties don't break regular Parquet +encryptionPropertiesList[0] = null; +decryptionPropertiesList[0] = null; + +// #1 Basic encryption setup +byte[] encryptionKey = new byte[16]; +random.nextBytes(encryptionKey); +FileEncryptionProperties encryptionProperties = FileEncryptionProperties.builder(encryptionKey).build(); +FileDecryptionProperties decryptionProperties = FileDecryptionProperties.builder().withFooterKey(encryptionKey).build(); +encryptionPropertiesList[1] = encryptionProperties; +decryptionPropertiesList[1] = decryptionProperties; + +// #2 Default algorithm, non-uniform encryption, key metadata, key retriever, AAD prefix +byte[] footerKey = new byte[16]; +random.nextBytes(footerKey); +byte[] columnKey0 = new byte[16]; +random.nextBytes(columnKey0); +byte[] columnKey1 = new byte[16]; +random.nextBytes(columnKey1); +ColumnEncryptionProperties columnProperties0 = ColumnEncryptionProperties.builder("binary_field") +.withKey(columnKey0) +.withKeyID("ck0") +.build(); +ColumnEncryptionProperties columnProperties1 = ColumnEncryptionProperties.builder("int32_field") +.withKey(columnKey1) +.withKeyID("ck1") +.build(); +HashMap columnPropertiesMap = new HashMap(); +columnPropertiesMap.put(columnProperties0.getPath(), columnProperties0); +columnPropertiesMap.put(columnProperties1.getPath(), columnProperties1); +byte[] AADPrefix = root.getName().getBytes(StandardCharsets.UTF_8); +encryptionProperties = FileEncryptionProperties.builder(footerKey) +.withFooterKeyID("fk") +.withAADPrefix(AADPrefix) +.withEncryptedColumns(columnPropertiesMap) +.build(); +StringKeyIdRetriever keyRetriever = new StringKeyIdRetriever();
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113242#comment-17113242 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-632130361 > I know, this is not the whole feature yet but would like to ensure that at the end we will have proper test coverity. So, the question is do we have or planning to have tests to verify the working of semi-encrypted files (some columns and the footer are plaintext) with previous readers. We test this manually. We also have an automatic test that runs the current reader code on files that were previously created with other writers (such as parquet-cpp with encryption; these files are kept in parquet-testing repo and fetched from there). But I don't know how to automate running of previous readers on the files written by the current code - ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113214#comment-17113214 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428673721 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -90,6 +102,10 @@ // Update last two bytes with new page ordinal (instead of creating new page AAD from scratch) public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) { +if (newPageOrdinal < 0) { Review comment: I agree on validating the arguments is important. But the proper exception to be thrown for a `null` is a `NullPointerException` that would be thrown at line 134 anyway. If you really want to validate the argument for `null` at the first line of the method I would suggest using `java.util.Objects.requireNonNull(Object)`. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -78,11 +78,22 @@ if (rowGroupOrdinal < 0) { throw new IllegalArgumentException("Wrong row group ordinal: " + rowGroupOrdinal); } -byte[] rowGroupOrdinalBytes = shortToBytesLE(rowGroupOrdinal); +short shortRGOrdinal = (short) rowGroupOrdinal; +if (shortRGOrdinal != rowGroupOrdinal) { + throw new ParquetCryptoRuntimeException("Encrypted parquet files can't have " + + "more than Short.MAX_VALUE row groups: " + rowGroupOrdinal); Review comment: I think, writing the actual value instead of referencing a java constant is more informative. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113204#comment-17113204 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428562074 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -1131,7 +1329,19 @@ public OffsetIndex readOffsetIndex(ColumnChunkMetaData column) throws IOExceptio return null; } f.seek(ref.getOffset()); -return ParquetMetadataConverter.fromParquetOffsetIndex(Util.readOffsetIndex(f)); + +column.decryptIfNeededed(); Review comment: Same as above. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java ## @@ -1035,20 +1154,89 @@ private static void serializeBloomFilters( long offset = out.getPos(); column.setBloomFilterOffset(offset); - Util.writeBloomFilterHeader(ParquetMetadataConverter.toBloomFilterHeader(bloomFilter), out); -bloomFilter.writeTo(out); + +BlockCipher.Encryptor bloomFilterEncryptor = null; +byte[] bloomFilterHeaderAAD = null; +byte[] bloomFilterBitsetAAD = null; +if (null != fileEncryptor) { + InternalColumnEncryptionSetup columnEncryptionSetup = fileEncryptor.getColumnSetup(column.getPath(), false, (short) cIndex); + if (columnEncryptionSetup.isEncrypted()) { +bloomFilterEncryptor = columnEncryptionSetup.getMetaDataEncryptor(); +short columnOrdinal = columnEncryptionSetup.getOrdinal(); +bloomFilterHeaderAAD = AesCipher.createModuleAAD(fileEncryptor.getFileAAD(), ModuleType.BloomFilterHeader, +block.getOrdinal(), columnOrdinal, (short)-1); +bloomFilterBitsetAAD = AesCipher.createModuleAAD(fileEncryptor.getFileAAD(), ModuleType.BloomFilterBitset, +block.getOrdinal(), columnOrdinal, (short)-1); + } +} + + Util.writeBloomFilterHeader(ParquetMetadataConverter.toBloomFilterHeader(bloomFilter), out, +bloomFilterEncryptor, bloomFilterHeaderAAD); + +ByteArrayOutputStream tempOutStream = new ByteArrayOutputStream(); +bloomFilter.writeTo(tempOutStream); +byte[] serializedBitset = tempOutStream.toByteArray(); +if (null != bloomFilterEncryptor) { + serializedBitset = bloomFilterEncryptor.encrypt(serializedBitset, bloomFilterBitsetAAD); +} +out.write(serializedBitset); } } } - - private static void serializeFooter(ParquetMetadata footer, PositionOutputStream out) throws IOException { -long footerIndex = out.getPos(); + + private static void serializeFooter(ParquetMetadata footer, PositionOutputStream out, + InternalFileEncryptor fileEncryptor) throws IOException { + ParquetMetadataConverter metadataConverter = new ParquetMetadataConverter(); -org.apache.parquet.format.FileMetaData parquetMetadata = metadataConverter.toParquetMetadata(CURRENT_VERSION, footer); -writeFileMetaData(parquetMetadata, out); -LOG.debug("{}: footer length = {}" , out.getPos(), (out.getPos() - footerIndex)); -BytesUtils.writeIntLittleEndian(out, (int) (out.getPos() - footerIndex)); -out.write(MAGIC); + +// Unencrypted file +if (null == fileEncryptor) { + long footerIndex = out.getPos(); + org.apache.parquet.format.FileMetaData parquetMetadata = metadataConverter.toParquetMetadata(CURRENT_VERSION, footer); + writeFileMetaData(parquetMetadata, out); + LOG.debug("{}: footer length = {}" , out.getPos(), (out.getPos() - footerIndex)); + BytesUtils.writeIntLittleEndian(out, (int) (out.getPos() - footerIndex)); + out.write(MAGIC); + return; +} + +org.apache.parquet.format.FileMetaData parquetMetadata = +metadataConverter.toParquetMetadata(CURRENT_VERSION, footer, fileEncryptor); + +// Encrypted file with plaintext footer +if (!fileEncryptor.isFooterEncrypted()) { Review comment: I don't know why we think that the footer length is an important information but this is the only case where we do not log it. We might want to add it here as well. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -1071,12 +1229,31 @@ public BloomFilterReader getBloomFilterDataReader(BlockMetaData block) { */ public BloomFilter readBloomFilter(ColumnChunkMetaData meta) throws IOException { long bloomFilterOffset = meta.getBloomFilterOffset(); + +if (0 == bloomFilterOffset) { // TODO Junjie - is there a better way to handle this? Review comment: @chenjunjiedada, could you please check this? ## File path:
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113075#comment-17113075 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428584998 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -705,22 +797,39 @@ public ParquetFileReader(Configuration conf, Path file, ParquetMetadata footer) paths.put(ColumnPath.get(col.getPath()), col); } this.crc = options.usePageChecksumVerification() ? new CRC32() : null; +this.fileDecryptor = fileMetaData.getFileDecryptor(); } public ParquetFileReader(InputFile file, ParquetReadOptions options) throws IOException { +this(file, options, null); + } + + public ParquetFileReader(InputFile file, ParquetReadOptions options, + FileDecryptionProperties fileDecryptionProperties) throws IOException { this.converter = new ParquetMetadataConverter(options); this.file = file; this.f = file.newStream(); this.options = options; +if ((null == fileDecryptionProperties) && (file instanceof HadoopInputFile)) { + HadoopInputFile hadoopFile = (HadoopInputFile) file; + fileDecryptionProperties = getDecryptionProperties(hadoopFile.getPath(), hadoopFile.getConfiguration()); Review comment: Sounds good. I'll check what can be done here. Also, will check how the deprecated readFooter functions can be handled for encryption. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112990#comment-17112990 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428535767 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -442,7 +460,13 @@ static ParquetMetadata readSummaryMetadata(Configuration configuration, Path bas */ @Deprecated public static final ParquetMetadata readFooter(Configuration configuration, Path file) throws IOException { -return readFooter(configuration, file, NO_FILTER); +return readFooter(configuration, file, getDecryptionProperties(file, configuration)); + } + + @Deprecated + public static final ParquetMetadata readFooter(Configuration configuration, Path file, Review comment: So, all of these new methods are used inside parquet-mr? If not, then I don't think we need them. If yes, then please, try to refactor the caller part to use the non-deprecated ones instead. If it does not require too much effort. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112833#comment-17112833 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428465094 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -705,22 +797,39 @@ public ParquetFileReader(Configuration conf, Path file, ParquetMetadata footer) paths.put(ColumnPath.get(col.getPath()), col); } this.crc = options.usePageChecksumVerification() ? new CRC32() : null; +this.fileDecryptor = fileMetaData.getFileDecryptor(); } public ParquetFileReader(InputFile file, ParquetReadOptions options) throws IOException { +this(file, options, null); + } + + public ParquetFileReader(InputFile file, ParquetReadOptions options, + FileDecryptionProperties fileDecryptionProperties) throws IOException { this.converter = new ParquetMetadataConverter(options); this.file = file; this.f = file.newStream(); this.options = options; +if ((null == fileDecryptionProperties) && (file instanceof HadoopInputFile)) { + HadoopInputFile hadoopFile = (HadoopInputFile) file; + fileDecryptionProperties = getDecryptionProperties(hadoopFile.getPath(), hadoopFile.getConfiguration()); Review comment: We need the file `Path` in order to create the file decryption properties. `ParquetReadOptions` and `HadoopReadOptions` don't keep/handle the file paths. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112812#comment-17112812 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428455071 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -442,7 +460,13 @@ static ParquetMetadata readSummaryMetadata(Configuration configuration, Path bas */ @Deprecated public static final ParquetMetadata readFooter(Configuration configuration, Path file) throws IOException { -return readFooter(configuration, file, NO_FILTER); +return readFooter(configuration, file, getDecryptionProperties(file, configuration)); + } + + @Deprecated + public static final ParquetMetadata readFooter(Configuration configuration, Path file, Review comment: These are encrypting versions of the existing `readFooter` functions, already marked as deprecated, but still actively used (eg in parquet-cli and in Spark). The deprecation comment says "@ deprecated will be removed in 2.0.0". Since we are not at parquet 2.0 yet, I've marked the encrypting versions of these functions as deprecated too, in order not to forget to handle them when working on parquet-2.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112252#comment-17112252 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428008353 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -499,55 +528,100 @@ public static final ParquetMetadata readFooter(Configuration configuration, File */ @Deprecated public static final ParquetMetadata readFooter(InputFile file, MetadataFilter filter) throws IOException { +return readFooter(file, filter, null); + } + + @Deprecated + public static final ParquetMetadata readFooter(InputFile file, MetadataFilter filter, Review comment: Why do we introduce new public methods that are deprecated already? ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -1465,23 +1674,38 @@ public void writeDataPageV2Header( dataEncoding, rlByteLength, dlByteLength), to); } - + Review comment: Please, check your code to not to introduce any trailing whitespaces. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java ## @@ -442,7 +460,13 @@ static ParquetMetadata readSummaryMetadata(Configuration configuration, Path bas */ @Deprecated public static final ParquetMetadata readFooter(Configuration configuration, Path file) throws IOException { -return readFooter(configuration, file, NO_FILTER); +return readFooter(configuration, file, getDecryptionProperties(file, configuration)); + } + + @Deprecated + public static final ParquetMetadata readFooter(Configuration configuration, Path file, Review comment: Why do we introduce new public methods that are deprecated already? ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) { return offset; } + private static void verifyFooterIntegrity(InputStream from, InternalFileDecryptor fileDecryptor, + int combinedFooterLength) throws IOException { + +byte[] nonce = new byte[AesCipher.NONCE_LENGTH]; +from.read(nonce); +byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH]; +from.read(gcmTag); + +AesGcmEncryptor footerSigner = fileDecryptor.createSignedFooterEncryptor(); + +byte[] footerAndSignature = ((ByteBufferInputStream) from).slice(0).array(); +int footerSignatureLength = AesCipher.NONCE_LENGTH + AesCipher.GCM_TAG_LENGTH; +byte[] serializedFooter = new byte[combinedFooterLength - footerSignatureLength]; +System.arraycopy(footerAndSignature, 0, serializedFooter, 0, serializedFooter.length); + +byte[] signedFooterAAD = AesCipher.createFooterAAD(fileDecryptor.getFileAAD()); +byte[] encryptedFooterBytes = footerSigner.encrypt(false, serializedFooter, nonce, signedFooterAAD); +byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH]; +System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - AesCipher.GCM_TAG_LENGTH, +calculatedTag, 0, AesCipher.GCM_TAG_LENGTH); +if (!Arrays.equals(gcmTag, calculatedTag)) { + throw new TagVerificationException("Signature mismatch in plaintext footer"); +} + } + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter) throws IOException { +return readParquetMetadata(from, filter, null, false, 0); + } + + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter, + final InternalFileDecryptor fileDecryptor, final boolean encryptedFooter, + final int combinedFooterLength) throws IOException { + +final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? fileDecryptor.fetchFooterDecryptor() : null); +final byte[] encryptedFooterAAD = (encryptedFooter? AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null); + FileMetaData fileMetaData = filter.accept(new MetadataFilterVisitor() { @Override public FileMetaData visit(NoFilter filter) throws IOException { -return readFileMetaData(from); +return readFileMetaData(from, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(SkipMetadataFilter filter) throws IOException { -return readFileMetaData(from, true); +return readFileMetaData(from, true, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(OffsetMetadataFilter filter) throws IOException { -return filterFileMetaDataByStart(readFileMetaData(from), filter); +
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112247#comment-17112247 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427843291 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -68,19 +67,32 @@ public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) { Review comment: Also - like with the page numbers in the previous comment, having too many row groups will adversely affect encryption performance. There are per-rowgroup encryption operations, always performed on small buffers - therefore, very slow (no hardware acceleration, etc). Having dozens/hundreds of thousands (or more) of them will significantly affect the overall encryption time of a file. Moreover, having lots of row groups might lead to having smaller data pages, which decreases the performance further. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112078#comment-17112078 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427934346 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -68,19 +67,32 @@ public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) { Review comment: Sounds good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112076#comment-17112076 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427932476 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -68,19 +67,32 @@ public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) { Review comment: Thanks for the value checking tip, I'll update the code to use it. As for a centralization - I think this function (`createModuleAAD`) is the right place. In the encryption feature, ordinals are used only for integrity verification - performed via AADs, which are calculated here for both encryption and decryption. Everywhere in the code, the ordinals will be an `int`. Since the `createModuleAAD` is called only for encrypted files, an exception will be thrown only for them (if an ordinal exceeds the `Short.MAX_VALUE`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112063#comment-17112063 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427925111 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -68,19 +67,32 @@ public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) { Review comment: I understand that large numbers of pages/row groups or columns would lead to significant performance drawbacks but it should not limit what the spec allows otherwise. Since it is discussed and approved already, I am fine with using `short` values for these. What I would suggest adding though is to have the conversion from `int` to `short` centralized and and have specific error messages so it is clear that the limit reached is a hard limit for the encryption feature. Also, if we will publish any description/example for the encryption feature these limitations shall be listed there. One more thing: the check of `intValue > Short.MAX_VALUE` is not complete. In case of `intValue` is negative the cast may result in a valid positive `short` value. I would suggest using `(short) intValue != intValue` instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112057#comment-17112057 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427922312 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -68,19 +67,32 @@ public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) { Review comment: Still, the variables in this function (and elsewhere) don't have to be `short`. After looking at the code, it seems `int`s are better suited for managing and checking these parameters (and for enabling any values in unencrypted files). I'll make this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111945#comment-17111945 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427845827 ## File path: parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndex.java ## @@ -49,6 +49,13 @@ * @return the index of the first row in the page */ public long getFirstRowIndex(int pageIndex); + + /** + * @param pageIndex + * the index of the page + * @return the original ordinal of the page in the column chunk + */ + public short getPageOrdinal(int pageIndex); Review comment: Plus - the page headers are also encrypted. These are small, so the hardware acceleration is not applied on them. Having dozens/hundreds of thousands (or more) of page headers will significantly affect the overall encryption time of a file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111799#comment-17111799 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427759101 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) { return offset; } + private static void verifyFooterIntegrity(InputStream from, InternalFileDecryptor fileDecryptor, + int combinedFooterLength) throws IOException { + +byte[] nonce = new byte[AesCipher.NONCE_LENGTH]; +from.read(nonce); +byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH]; +from.read(gcmTag); + +AesGcmEncryptor footerSigner = fileDecryptor.createSignedFooterEncryptor(); + +byte[] footerAndSignature = ((ByteBufferInputStream) from).slice(0).array(); +int footerSignatureLength = AesCipher.NONCE_LENGTH + AesCipher.GCM_TAG_LENGTH; +byte[] serializedFooter = new byte[combinedFooterLength - footerSignatureLength]; +System.arraycopy(footerAndSignature, 0, serializedFooter, 0, serializedFooter.length); + +byte[] signedFooterAAD = AesCipher.createFooterAAD(fileDecryptor.getFileAAD()); +byte[] encryptedFooterBytes = footerSigner.encrypt(false, serializedFooter, nonce, signedFooterAAD); +byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH]; +System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - AesCipher.GCM_TAG_LENGTH, +calculatedTag, 0, AesCipher.GCM_TAG_LENGTH); +if (!Arrays.equals(gcmTag, calculatedTag)) { + throw new TagVerificationException("Signature mismatch in plaintext footer"); +} + } + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter) throws IOException { +return readParquetMetadata(from, filter, null, false, 0); + } + + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter, + final InternalFileDecryptor fileDecryptor, final boolean encryptedFooter, + final int combinedFooterLength) throws IOException { + +final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? fileDecryptor.fetchFooterDecryptor() : null); +final byte[] encryptedFooterAAD = (encryptedFooter? AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null); + FileMetaData fileMetaData = filter.accept(new MetadataFilterVisitor() { @Override public FileMetaData visit(NoFilter filter) throws IOException { -return readFileMetaData(from); +return readFileMetaData(from, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(SkipMetadataFilter filter) throws IOException { -return readFileMetaData(from, true); +return readFileMetaData(from, true, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(OffsetMetadataFilter filter) throws IOException { -return filterFileMetaDataByStart(readFileMetaData(from), filter); +return filterFileMetaDataByStart(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } @Override public FileMetaData visit(RangeMetadataFilter filter) throws IOException { -return filterFileMetaDataByMidpoint(readFileMetaData(from), filter); +return filterFileMetaDataByMidpoint(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } }); LOG.debug("{}", fileMetaData); -ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData); + +if (!encryptedFooter && null != fileDecryptor) { + if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file Review comment: In cryptography, plaintext is an opposite of ciphertext (the result of plaintext encryption). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs --
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111782#comment-17111782 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427745576 ## File path: parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndex.java ## @@ -49,6 +49,13 @@ * @return the index of the first row in the page */ public long getFirstRowIndex(int pageIndex); + + /** + * @param pageIndex + * the index of the page + * @return the original ordinal of the page in the column chunk + */ + public short getPageOrdinal(int pageIndex); Review comment: The background discussion is here, https://github.com/apache/parquet-mr/pull/776#discussion_r427743861 In the case of pages, encryption becomes an order (or two orders) of magnitude slower if the pages are small. Basically, the hardware acceleration does not kick in with small pages (and there are additional problems). This is another reason not to allow more than 32K pages in a chunk. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111775#comment-17111775 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427743861 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -68,19 +67,32 @@ public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) { Review comment: The links to the discussion on this, https://github.com/apache/parquet-format/pull/114#discussion_r234022579 https://github.com/apache/parquet-format/pull/114#discussion_r232941138 http://mail-archives.apache.org/mod_mbox/parquet-dev/201901.mbox/%3CCAO4re1kM4xGMNT4CGrjvA43t-QgUmUwLMskTJfd8ivgCfF8rSw%40mail.gmail.com%3E The parquet-cpp approach to this is to allow for any number of row groups in files without encryption, and to limit it to 32K in encrypted files, https://github.com/apache/arrow/commit/0c5168cf203ddf94dff01c178d649853323acbb8 "While writing files with so many row groups is a bad idea, people will still do it... This .. enables reading the many-row-group files again. Files with encrypted row group metadata with that many row groups cannot be read" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111206#comment-17111206 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427284527 ## File path: parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndex.java ## @@ -49,6 +49,13 @@ * @return the index of the first row in the page */ public long getFirstRowIndex(int pageIndex); + + /** + * @param pageIndex + * the index of the page + * @return the original ordinal of the page in the column chunk + */ + public short getPageOrdinal(int pageIndex); Review comment: Why do we need it as a `short` instead of keeping it as an `int`? As per the parquet.thrift spec we never say that we cannot have more pages than `32767` even if it is unlikely to have such many. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -68,19 +67,32 @@ public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) { Review comment: Theoretically we don't give hard limits for the number of row groups, number of columns or the number of pages in the spec. There is a de facto limit that we use thrift lists where the size is an i32 meaning that we should allow java int values here. Also, there was a post commit discussion in a [related PR](https://github.com/apache/parquet-format/pull/142#issuecomment-600294754). It is unfortunate that that time parquet-format was already released so I don't know if there is a way to properly fix this issue in the format. Anyway, I would not restrict these values to a `short`. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java ## @@ -98,16 +104,21 @@ ((lengthBuffer[0] & 0xff)); if (ciphertextLength < 1) { - throw new IOException("Wrong length of encrypted metadata: " + ciphertextLength); + throw new ParquetCryptoRuntimeException("Wrong length of encrypted metadata: " + ciphertextLength); } byte[] ciphertextBuffer = new byte[ciphertextLength]; gotBytes = 0; // Read the encrypted structure contents while (gotBytes < ciphertextLength) { - int n = from.read(ciphertextBuffer, gotBytes, ciphertextLength - gotBytes); + int n; + try { +n = from.read(ciphertextBuffer, gotBytes, ciphertextLength - gotBytes); + } catch (IOException e) { Review comment: We should let the `IOException` thrown out. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java ## @@ -70,23 +69,30 @@ if (null != AAD) cipher.updateAAD(AAD); cipher.doFinal(ciphertext, inputOffset, inputLength, plainText, outputOffset); -} catch (GeneralSecurityException e) { - throw new IOException("Failed to decrypt", e); +} catch (AEADBadTagException e) { + throw new TagVerificationException("GCM tag check failed", e); +} catch (GeneralSecurityException e) { + throw new ParquetCryptoRuntimeException("Failed to decrypt", e); } return plainText; } @Override - public byte[] decrypt(InputStream from, byte[] AAD) throws IOException { + public byte[] decrypt(InputStream from, byte[] AAD) { byte[] lengthBuffer = new byte[SIZE_LENGTH]; int gotBytes = 0; // Read the length of encrypted Thrift structure while (gotBytes < SIZE_LENGTH) { - int n = from.read(lengthBuffer, gotBytes, SIZE_LENGTH - gotBytes); + int n; + try { +n = from.read(lengthBuffer, gotBytes, SIZE_LENGTH - gotBytes); + } catch (IOException e) { Review comment: We should let the `IOException` thrown out. ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) { return offset; } + private static void verifyFooterIntegrity(InputStream from, InternalFileDecryptor fileDecryptor, + int combinedFooterLength) throws IOException { + +byte[] nonce = new byte[AesCipher.NONCE_LENGTH]; +from.read(nonce); +byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH]; +from.read(gcmTag); + +AesGcmEncryptor footerSigner = fileDecryptor.createSignedFooterEncryptor(); + +byte[] footerAndSignature = ((ByteBufferInputStream) from).slice(0).array(); +int footerSignatureLength = AesCipher.NONCE_LENGTH + AesCipher.GCM_TAG_LENGTH; +byte[] serializedFooter = new byte[combinedFooterLength - footerSignatureLength]; +System.arraycopy(footerAndSignature, 0, serializedFooter,
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103615#comment-17103615 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-626266715 @ggershinsky Do you have time to review the code? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103614#comment-17103614 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r422577368 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, Review comment: Not all the changes are isolated. Generally, adding 'if/else' will add diverge the code and add the complexity. One other thing is regression thinking. If fileEncryptor is null, which would be most of the case, then it just executes the existing method without change. It would be less error prone. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103604#comment-17103604 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r422575326 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) { return offset; } + private static void verifyFooterIntegrity(InputStream from, InternalFileDecryptor fileDecryptor, + int combinedFooterLength) throws IOException { + +byte[] nonce = new byte[AesCipher.NONCE_LENGTH]; +from.read(nonce); +byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH]; +from.read(gcmTag); + +AesGcmEncryptor footerSigner = fileDecryptor.createSignedFooterEncryptor(); + +byte[] footerAndSignature = ((ByteBufferInputStream) from).slice(0).array(); +int footerSignatureLength = AesCipher.NONCE_LENGTH + AesCipher.GCM_TAG_LENGTH; +byte[] serializedFooter = new byte[combinedFooterLength - footerSignatureLength]; +System.arraycopy(footerAndSignature, 0, serializedFooter, 0, serializedFooter.length); + +byte[] signedFooterAAD = AesCipher.createFooterAAD(fileDecryptor.getFileAAD()); +byte[] encryptedFooterBytes = footerSigner.encrypt(false, serializedFooter, nonce, signedFooterAAD); +byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH]; +System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - AesCipher.GCM_TAG_LENGTH, +calculatedTag, 0, AesCipher.GCM_TAG_LENGTH); +if (!Arrays.equals(gcmTag, calculatedTag)) { + throw new TagVerificationException("Signature mismatch in plaintext footer"); +} + } + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter) throws IOException { +return readParquetMetadata(from, filter, null, false, 0); + } + + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter, + final InternalFileDecryptor fileDecryptor, final boolean encryptedFooter, + final int combinedFooterLength) throws IOException { + +final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? fileDecryptor.fetchFooterDecryptor() : null); +final byte[] encryptedFooterAAD = (encryptedFooter? AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null); + FileMetaData fileMetaData = filter.accept(new MetadataFilterVisitor() { @Override public FileMetaData visit(NoFilter filter) throws IOException { -return readFileMetaData(from); +return readFileMetaData(from, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(SkipMetadataFilter filter) throws IOException { -return readFileMetaData(from, true); +return readFileMetaData(from, true, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(OffsetMetadataFilter filter) throws IOException { -return filterFileMetaDataByStart(readFileMetaData(from), filter); +return filterFileMetaDataByStart(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } @Override public FileMetaData visit(RangeMetadataFilter filter) throws IOException { -return filterFileMetaDataByMidpoint(readFileMetaData(from), filter); +return filterFileMetaDataByMidpoint(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } }); LOG.debug("{}", fileMetaData); -ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData); + +if (!encryptedFooter && null != fileDecryptor) { + if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file Review comment: No. I see most of them on up line but a few on the same line. It is not a must. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098269#comment-17098269 ] ASF GitHub Bot commented on PARQUET-1229: - prdpsvs commented on pull request #472: URL: https://github.com/apache/parquet-mr/pull/472#issuecomment-623066084 When is this feature going to be available? Any alternatives available today? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093195#comment-17093195 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415622844 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) { return offset; } + private static void verifyFooterIntegrity(InputStream from, InternalFileDecryptor fileDecryptor, + int combinedFooterLength) throws IOException { + +byte[] nonce = new byte[AesCipher.NONCE_LENGTH]; +from.read(nonce); +byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH]; +from.read(gcmTag); + +AesGcmEncryptor footerSigner = fileDecryptor.createSignedFooterEncryptor(); + +byte[] footerAndSignature = ((ByteBufferInputStream) from).slice(0).array(); +int footerSignatureLength = AesCipher.NONCE_LENGTH + AesCipher.GCM_TAG_LENGTH; +byte[] serializedFooter = new byte[combinedFooterLength - footerSignatureLength]; +System.arraycopy(footerAndSignature, 0, serializedFooter, 0, serializedFooter.length); + +byte[] signedFooterAAD = AesCipher.createFooterAAD(fileDecryptor.getFileAAD()); +byte[] encryptedFooterBytes = footerSigner.encrypt(false, serializedFooter, nonce, signedFooterAAD); +byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH]; +System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - AesCipher.GCM_TAG_LENGTH, +calculatedTag, 0, AesCipher.GCM_TAG_LENGTH); +if (!Arrays.equals(gcmTag, calculatedTag)) { + throw new TagVerificationException("Signature mismatch in plaintext footer"); +} + } + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter) throws IOException { +return readParquetMetadata(from, filter, null, false, 0); + } + + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter, + final InternalFileDecryptor fileDecryptor, final boolean encryptedFooter, + final int combinedFooterLength) throws IOException { + +final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? fileDecryptor.fetchFooterDecryptor() : null); +final byte[] encryptedFooterAAD = (encryptedFooter? AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null); + FileMetaData fileMetaData = filter.accept(new MetadataFilterVisitor() { @Override public FileMetaData visit(NoFilter filter) throws IOException { -return readFileMetaData(from); +return readFileMetaData(from, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(SkipMetadataFilter filter) throws IOException { -return readFileMetaData(from, true); +return readFileMetaData(from, true, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(OffsetMetadataFilter filter) throws IOException { -return filterFileMetaDataByStart(readFileMetaData(from), filter); +return filterFileMetaDataByStart(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } @Override public FileMetaData visit(RangeMetadataFilter filter) throws IOException { -return filterFileMetaDataByMidpoint(readFileMetaData(from), filter); +return filterFileMetaDataByMidpoint(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } }); LOG.debug("{}", fileMetaData); -ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData); + +if (!encryptedFooter && null != fileDecryptor) { + if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file Review comment: is there a requirement in the code formatting rules in this community to keep comments in separate lines? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and >
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093193#comment-17093193 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415621398 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, Review comment: encryption is isolated there, with _if (null != fileEncryptor)_ and _if (encryptMetaData)_ - similar to the _if (columnIndexRef != null)_ and _if (offsetIndexRef != null)_ in the same function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093189#comment-17093189 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415617873 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -90,6 +102,10 @@ // Update last two bytes with new page ordinal (instead of creating new page AAD from scratch) public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) { +if (newPageOrdinal < 0) { Review comment: Ok, I'll add this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093187#comment-17093187 ] ASF GitHub Bot commented on PARQUET-1229: - gszadovszky commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-619820989 > > preferably reviewed after the Travis fix is in. @gszadovszky @shangxinli can you apply #777 to the encryption branch. > > It seems I still don't have permission to do so. @gszadovszky, I did 'git cherry-pick' and then git push to encryption branch', but got permission error. Do I have to create a PR for it? @shangxinli, I've done a rebase on the branch `encryption`. Maybe, that's why you were unable to push a commit because after the rebase it would not be a fast forward commit (would need a force-push). (Because of the rebase the Travis fix is already in.) If you have any problems with manual push to the repo, let's bring it offline. I'm happy to help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092906#comment-17092906 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415446216 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) { return offset; } + private static void verifyFooterIntegrity(InputStream from, InternalFileDecryptor fileDecryptor, + int combinedFooterLength) throws IOException { + +byte[] nonce = new byte[AesCipher.NONCE_LENGTH]; +from.read(nonce); +byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH]; +from.read(gcmTag); + +AesGcmEncryptor footerSigner = fileDecryptor.createSignedFooterEncryptor(); + +byte[] footerAndSignature = ((ByteBufferInputStream) from).slice(0).array(); +int footerSignatureLength = AesCipher.NONCE_LENGTH + AesCipher.GCM_TAG_LENGTH; +byte[] serializedFooter = new byte[combinedFooterLength - footerSignatureLength]; +System.arraycopy(footerAndSignature, 0, serializedFooter, 0, serializedFooter.length); + +byte[] signedFooterAAD = AesCipher.createFooterAAD(fileDecryptor.getFileAAD()); +byte[] encryptedFooterBytes = footerSigner.encrypt(false, serializedFooter, nonce, signedFooterAAD); +byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH]; +System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - AesCipher.GCM_TAG_LENGTH, +calculatedTag, 0, AesCipher.GCM_TAG_LENGTH); +if (!Arrays.equals(gcmTag, calculatedTag)) { + throw new TagVerificationException("Signature mismatch in plaintext footer"); +} + } + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter) throws IOException { +return readParquetMetadata(from, filter, null, false, 0); + } + + public ParquetMetadata readParquetMetadata(final InputStream from, MetadataFilter filter, + final InternalFileDecryptor fileDecryptor, final boolean encryptedFooter, + final int combinedFooterLength) throws IOException { + +final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? fileDecryptor.fetchFooterDecryptor() : null); +final byte[] encryptedFooterAAD = (encryptedFooter? AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null); + FileMetaData fileMetaData = filter.accept(new MetadataFilterVisitor() { @Override public FileMetaData visit(NoFilter filter) throws IOException { -return readFileMetaData(from); +return readFileMetaData(from, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(SkipMetadataFilter filter) throws IOException { -return readFileMetaData(from, true); +return readFileMetaData(from, true, footerDecryptor, encryptedFooterAAD); } @Override public FileMetaData visit(OffsetMetadataFilter filter) throws IOException { -return filterFileMetaDataByStart(readFileMetaData(from), filter); +return filterFileMetaDataByStart(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } @Override public FileMetaData visit(RangeMetadataFilter filter) throws IOException { -return filterFileMetaDataByMidpoint(readFileMetaData(from), filter); +return filterFileMetaDataByMidpoint(readFileMetaData(from, footerDecryptor, encryptedFooterAAD), filter); } }); LOG.debug("{}", fileMetaData); -ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData); + +if (!encryptedFooter && null != fileDecryptor) { + if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file Review comment: Move comments up This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092904#comment-17092904 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415444590 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, Review comment: Should we keep original addRowGroup() intact and just add a new one with InternalFileEncryptor to isolate the change's impact? There is not much duplicate code if doing so and we can refactor the existing code with helper functions. In the majority use cases, they are non-encryption cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092903#comment-17092903 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415444590 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, Review comment: Should we keep original addRowGroup() intact and just add a new one with InternalFileEncryptor? There is not much duplicate code if doing so and we can refactor the existing code with helper functions. In the majority use cases, they are non-encryption cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092902#comment-17092902 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415444590 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, Review comment: Should we keep original addRowGroup() intact and just add a new one with InternalFileEncryptor? There is not much duplicate code if doing so and we can refactor the existing code with helper functions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092901#comment-17092901 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415440161 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, + InternalFileEncryptor fileEncryptor) throws IOException { + //rowGroup.total_byte_size = ; Review comment: Can we remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092900#comment-17092900 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415440161 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java ## @@ -463,14 +486,29 @@ ConvertedType convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation } } - private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block) { + private void addRowGroup(ParquetMetadata parquetMetadata, List rowGroups, BlockMetaData block, + InternalFileEncryptor fileEncryptor) throws IOException { + //rowGroup.total_byte_size = ; Review comment: Can we remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092890#comment-17092890 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on a change in pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415429416 ## File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java ## @@ -90,6 +102,10 @@ // Update last two bytes with new page ordinal (instead of creating new page AAD from scratch) public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) { +if (newPageOrdinal < 0) { Review comment: Since this is a public method, can we validate pageAAD also? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092885#comment-17092885 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on pull request #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-619640880 > preferably reviewed after the Travis fix is in. @gszadovszky @shangxinli can you apply #777 to the encryption branch. It seems I still don't have permission to do so. @gszadovszky, I did 'git cherry-pick' and then git push to encryption branch', but got permission error. Do I have to create a PR for it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089768#comment-17089768 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on issue #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-617843400 the PR is ready for review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088391#comment-17088391 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on issue #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-617009069 > can you squash it to one single commit to make the review easier? This can be reviewed as a single commit at https://github.com/apache/parquet-mr/pull/776/files This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088389#comment-17088389 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on issue #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-617008320 preferably reviewed after the Travis fix is in. @gszadovszky @shangxinli can you apply #777 to the encryption branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087801#comment-17087801 ] ASF GitHub Bot commented on PARQUET-1229: - shangxinli commented on issue #776: URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-616599564 Is this ready to review? Since there is no comment yet, can you squash it to one single commit to make the review easier? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072479#comment-17072479 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on pull request #776: PARQUET-1229: Parquet MR encryption URL: https://github.com/apache/parquet-mr/pull/776 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071492#comment-17071492 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on pull request #643: PARQUET-1229 Parquet-MR encryption URL: https://github.com/apache/parquet-mr/pull/643 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837768#comment-16837768 ] Gidon Gershinsky commented on PARQUET-1229: --- Once parquet-mr-1.11.0 is released, a new PR will be created for this jira. > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837203#comment-16837203 ] Ken Wang commented on PARQUET-1229: --- I see the original PR was close, but I didn't find the code in master branch. > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764761#comment-16764761 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky commented on pull request #472: PARQUET-1229: Encryption/decryption support to the existing Parquet classes URL: https://github.com/apache/parquet-mr/pull/472 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Labels: pull-request-available > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support
[ https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459521#comment-16459521 ] ASF GitHub Bot commented on PARQUET-1229: - ggershinsky opened a new pull request #472: PARQUET-1229: Encryption/decryption support to the existing Parquet classes URL: https://github.com/apache/parquet-mr/pull/472 Depends on #471 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > parquet-mr code changes for encryption support > -- > > Key: PARQUET-1229 > URL: https://issues.apache.org/jira/browse/PARQUET-1229 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > > Addition of encryption/decryption support to the existing Parquet classes and > APIs -- This message was sent by Atlassian JIRA (v7.6.3#76005)