[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #615: PARQUET-1373: Encryption key tools

2020-06-30 Thread GitBox


ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447508660



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto.keytools;
+
+import java.io.IOException;
+import java.io.StringReader;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
+import org.codehaus.jackson.map.ObjectMapper;
+import org.codehaus.jackson.type.TypeReference;
+
+public class KeyMaterial {
+  static final String KEY_MATERIAL_TYPE_FIELD = "keyMaterialType";
+  static final String KEY_MATERIAL_TYPE = "PKMT1";
+  static final String KEY_MATERIAL_INTERNAL_STORAGE_FIELD = "internalStorage";
+
+  static final String FOOTER_KEY_ID_IN_FILE = "footerKey";
+  static final String COLUMN_KEY_ID_IN_FILE_PREFIX = "columnKey";
+  
+  private static final String IS_FOOTER_KEY_FIELD = "isFooterKey";
+  private static final String DOUBLE_WRAPPING_FIELD = "doubleWrapping";
+  private static final String KMS_INSTANCE_ID_FIELD = "kmsInstanceID";
+  private static final String KMS_INSTANCE_URL_FIELD = "kmsInstanceURL";
+  private static final String MASTER_KEY_ID_FIELD = "masterKeyID";
+  private static final String WRAPPED_DEK_FIELD = "wrappedDEK";
+  private static final String KEK_ID_FIELD = "keyEncryptionKeyID";
+  private static final String WRAPPED_KEK_FIELD = "wrappedKEK";
+
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+
+  private final boolean isFooterKey;
+  private final String kmsInstanceID;
+  private final String kmsInstanceURL;
+  private final String masterKeyID;
+  private final boolean isDoubleWrapped;
+  private final String kekID;
+  private final String encodedWrappedKEK;
+  private final String encodedWrappedDEK;
+
+  private KeyMaterial(boolean isFooterKey, String kmsInstanceID, String 
kmsInstanceURL, String masterKeyID, 
+  boolean isDoubleWrapped, String kekID, String encodedWrappedKEK, String 
encodedWrappedDEK) {
+this.isFooterKey = isFooterKey;
+this.kmsInstanceID = kmsInstanceID;
+this.kmsInstanceURL = kmsInstanceURL;
+this.masterKeyID = masterKeyID;
+this.isDoubleWrapped = isDoubleWrapped;
+this.kekID = kekID;
+this.encodedWrappedKEK = encodedWrappedKEK;
+this.encodedWrappedDEK = encodedWrappedDEK;
+  }
+
+  static KeyMaterial parse(Map keyMaterialJson) {
+boolean isFooterKey = 
Boolean.valueOf(keyMaterialJson.get(IS_FOOTER_KEY_FIELD));
+String kmsInstanceID = null;
+String kmsInstanceURL = null;
+if (isFooterKey) {
+  kmsInstanceID = keyMaterialJson.get(KMS_INSTANCE_ID_FIELD);
+  kmsInstanceURL = keyMaterialJson.get(KMS_INSTANCE_URL_FIELD);
+}
+boolean isDoubleWrapped = 
Boolean.valueOf(keyMaterialJson.get(DOUBLE_WRAPPING_FIELD));
+String masterKeyID = keyMaterialJson.get(MASTER_KEY_ID_FIELD);
+String  encodedWrappedDEK = keyMaterialJson.get(WRAPPED_DEK_FIELD);
+String kekID = null;
+String encodedWrappedKEK = null;
+if (isDoubleWrapped) {
+  kekID = keyMaterialJson.get(KEK_ID_FIELD);
+  encodedWrappedKEK = keyMaterialJson.get(WRAPPED_KEK_FIELD);
+}
+
+return new KeyMaterial(isFooterKey, kmsInstanceID, kmsInstanceURL, 
masterKeyID, isDoubleWrapped, kekID, encodedWrappedKEK, encodedWrappedDEK);
+  }
+
+  static KeyMaterial parse(String keyMaterialString) {
+Map keyMaterialJson = null;
+try {
+  keyMaterialJson = OBJECT_MAPPER.readValue(new 
StringReader(keyMaterialString),
+  new TypeReference>() {});
+} catch (IOException e) {
+  throw new ParquetCryptoRuntimeException("Failed to parse key metadata " 
+ keyMaterialString, e);
+}
+String keyMaterialType = keyMaterialJson.get(KEY_MATERIAL_TYPE_FIELD);
+if (!KEY_MATERIAL_TYPE.equals(keyMaterialType)) {
+  throw new ParquetCryptoRuntimeException("Wrong key material type: " + 
keyMaterialType + 
+  " vs " + KEY_MATERIAL_TYPE);
+}
+return parse(keyMaterialJson);
+  }
+
+  static String createSerialized(boolean isFooterKey, String kms

[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #615: PARQUET-1373: Encryption key tools

2020-06-30 Thread GitBox


ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447508802



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*

Review comment:
   > .. With the correct annotations it can map a java object automatically.
   
   This approach seems to be optimal for objects with a fixed structure / 
fields - because it searches for all object fields in the json file. In our 
case, many fields are not always written (eg kms instance or url, if the key is 
for a column, and not for the footer; and other examples). Searching for them 
always is an overhead. Moreover, in case of internal storage, we don't need to 
parse two objects - key metadata and key material, because they are the same, 
so parsing one object is sufficient. The code we have today, performs only the 
search/parse of the relevant objects/fields, so it is optimal in that sense - 
and also well-defined in one place. We can add more comments to the code to 
make the field parsing logic crystal clear. What do you think?
   
   > 
   > I think, the format of these json objects is important for compatibility. 
We shall specify them or at least give an example in the comments.
   
   Sounds good. In addition to the comments mentioned above (that will be added 
to the relevant code lines), we will add a class comment to each relevant 
class, that documents the structure of the corresponding json.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-06-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148436#comment-17148436
 ] 

ASF GitHub Bot commented on PARQUET-1373:
-

ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447508802



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*

Review comment:
   > .. With the correct annotations it can map a java object automatically.
   
   This approach seems to be optimal for objects with a fixed structure / 
fields - because it searches for all object fields in the json file. In our 
case, many fields are not always written (eg kms instance or url, if the key is 
for a column, and not for the footer; and other examples). Searching for them 
always is an overhead. Moreover, in case of internal storage, we don't need to 
parse two objects - key metadata and key material, because they are the same, 
so parsing one object is sufficient. The code we have today, performs only the 
search/parse of the relevant objects/fields, so it is optimal in that sense - 
and also well-defined in one place. We can add more comments to the code to 
make the field parsing logic crystal clear. What do you think?
   
   > 
   > I think, the format of these json objects is important for compatibility. 
We shall specify them or at least give an example in the comments.
   
   Sounds good. In addition to the comments mentioned above (that will be added 
to the relevant code lines), we will add a class comment to each relevant 
class, that documents the structure of the corresponding json.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Encryption key management tools 
> 
>
> Key: PARQUET-1373
> URL: https://issues.apache.org/jira/browse/PARQUET-1373
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>
> Parquet Modular Encryption 
> ([PARQUET-1178|https://issues.apache.org/jira/browse/PARQUET-1178]) provides 
> an API that accepts keys, arbitrary key metadata and key retrieval callbacks 
> - which allows to implement basically any key management policy on top of it. 
> This Jira will add tools that implement a set of best practice elements for 
> key management. This is not an end-to-end key management, but rather a set of 
> components that might simplify design and development of an end-to-end 
> solution.
> This tool set is one of many possible. There is no goal to create a single or 
> “standard” toolkit for Parquet encryption keys. Parquet has a Crypto Factory 
> interface [(PARQUET-1817|https://issues.apache.org/jira/browse/PARQUET-1817]) 
> that allows to plug in different implementations of encryption key management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #615: PARQUET-1373: Encryption key tools

2020-06-30 Thread GitBox


ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447509073



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/PropertiesDrivenCryptoFactory.java
##
@@ -36,38 +39,62 @@
 import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
 import org.apache.parquet.hadoop.api.WriteSupport.WriteContext;
 import org.apache.parquet.hadoop.metadata.ColumnPath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import static org.apache.parquet.crypto.keytools.KeyToolkit.stringIsEmpty;
 
 public class PropertiesDrivenCryptoFactory implements 
EncryptionPropertiesFactory, DecryptionPropertiesFactory {
-
-  public static final String COLUMN_KEYS_PROPERTY_NAME = 
"encryption.column.keys";
-  public static final String FOOTER_KEY_PROPERTY_NAME = 
"encryption.footer.key";
-  public static final String ENCRYPTION_ALGORITHM_PROPERTY_NAME = 
"encryption.algorithm";
-  public static final String PLAINTEXT_FOOTER_PROPERTY_NAME = 
"encryption.plaintext.footer";
+  private static final Logger LOG = 
LoggerFactory.getLogger(PropertiesDrivenCryptoFactory.class);
   
-  public static final int DEK_LENGTH = 16;
-
-  private static final SecureRandom random = new SecureRandom();
+  private static final Integer[] ACCEPTABLE_DATA_KEY_LENGTHS = {128, 192, 256};
+  private static final Set ACCEPTABLE_DATA_KEY_LENGTHS_SET =
+new HashSet<>(Arrays.asList(ACCEPTABLE_DATA_KEY_LENGTHS));

Review comment:
   Sure, we'll change this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-06-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148434#comment-17148434
 ] 

ASF GitHub Bot commented on PARQUET-1373:
-

ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447508660



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto.keytools;
+
+import java.io.IOException;
+import java.io.StringReader;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
+import org.codehaus.jackson.map.ObjectMapper;
+import org.codehaus.jackson.type.TypeReference;
+
+public class KeyMaterial {
+  static final String KEY_MATERIAL_TYPE_FIELD = "keyMaterialType";
+  static final String KEY_MATERIAL_TYPE = "PKMT1";
+  static final String KEY_MATERIAL_INTERNAL_STORAGE_FIELD = "internalStorage";
+
+  static final String FOOTER_KEY_ID_IN_FILE = "footerKey";
+  static final String COLUMN_KEY_ID_IN_FILE_PREFIX = "columnKey";
+  
+  private static final String IS_FOOTER_KEY_FIELD = "isFooterKey";
+  private static final String DOUBLE_WRAPPING_FIELD = "doubleWrapping";
+  private static final String KMS_INSTANCE_ID_FIELD = "kmsInstanceID";
+  private static final String KMS_INSTANCE_URL_FIELD = "kmsInstanceURL";
+  private static final String MASTER_KEY_ID_FIELD = "masterKeyID";
+  private static final String WRAPPED_DEK_FIELD = "wrappedDEK";
+  private static final String KEK_ID_FIELD = "keyEncryptionKeyID";
+  private static final String WRAPPED_KEK_FIELD = "wrappedKEK";
+
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+
+  private final boolean isFooterKey;
+  private final String kmsInstanceID;
+  private final String kmsInstanceURL;
+  private final String masterKeyID;
+  private final boolean isDoubleWrapped;
+  private final String kekID;
+  private final String encodedWrappedKEK;
+  private final String encodedWrappedDEK;
+
+  private KeyMaterial(boolean isFooterKey, String kmsInstanceID, String 
kmsInstanceURL, String masterKeyID, 
+  boolean isDoubleWrapped, String kekID, String encodedWrappedKEK, String 
encodedWrappedDEK) {
+this.isFooterKey = isFooterKey;
+this.kmsInstanceID = kmsInstanceID;
+this.kmsInstanceURL = kmsInstanceURL;
+this.masterKeyID = masterKeyID;
+this.isDoubleWrapped = isDoubleWrapped;
+this.kekID = kekID;
+this.encodedWrappedKEK = encodedWrappedKEK;
+this.encodedWrappedDEK = encodedWrappedDEK;
+  }
+
+  static KeyMaterial parse(Map keyMaterialJson) {
+boolean isFooterKey = 
Boolean.valueOf(keyMaterialJson.get(IS_FOOTER_KEY_FIELD));
+String kmsInstanceID = null;
+String kmsInstanceURL = null;
+if (isFooterKey) {
+  kmsInstanceID = keyMaterialJson.get(KMS_INSTANCE_ID_FIELD);
+  kmsInstanceURL = keyMaterialJson.get(KMS_INSTANCE_URL_FIELD);
+}
+boolean isDoubleWrapped = 
Boolean.valueOf(keyMaterialJson.get(DOUBLE_WRAPPING_FIELD));
+String masterKeyID = keyMaterialJson.get(MASTER_KEY_ID_FIELD);
+String  encodedWrappedDEK = keyMaterialJson.get(WRAPPED_DEK_FIELD);
+String kekID = null;
+String encodedWrappedKEK = null;
+if (isDoubleWrapped) {
+  kekID = keyMaterialJson.get(KEK_ID_FIELD);
+  encodedWrappedKEK = keyMaterialJson.get(WRAPPED_KEK_FIELD);
+}
+
+return new KeyMaterial(isFooterKey, kmsInstanceID, kmsInstanceURL, 
masterKeyID, isDoubleWrapped, kekID, encodedWrappedKEK, encodedWrappedDEK);
+  }
+
+  static KeyMaterial parse(String keyMaterialString) {
+Map keyMaterialJson = null;
+try {
+  keyMaterialJson = OBJECT_MAPPER.readValue(new 
StringReader(keyMaterialString),
+  new TypeReference>() {});
+} catch (IOException e) {
+  throw new ParquetCryptoRuntimeException("Failed to parse key metadata " 
+ keyMaterialString, e);
+}
+String keyMaterialType = keyMaterialJson.get(KEY_MATERIAL_TYPE_FIELD);
+if (!KEY_MATERIAL_TYPE.equals(keyMaterial

[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-06-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148437#comment-17148437
 ] 

ASF GitHub Bot commented on PARQUET-1373:
-

ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447509073



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/PropertiesDrivenCryptoFactory.java
##
@@ -36,38 +39,62 @@
 import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
 import org.apache.parquet.hadoop.api.WriteSupport.WriteContext;
 import org.apache.parquet.hadoop.metadata.ColumnPath;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import static org.apache.parquet.crypto.keytools.KeyToolkit.stringIsEmpty;
 
 public class PropertiesDrivenCryptoFactory implements 
EncryptionPropertiesFactory, DecryptionPropertiesFactory {
-
-  public static final String COLUMN_KEYS_PROPERTY_NAME = 
"encryption.column.keys";
-  public static final String FOOTER_KEY_PROPERTY_NAME = 
"encryption.footer.key";
-  public static final String ENCRYPTION_ALGORITHM_PROPERTY_NAME = 
"encryption.algorithm";
-  public static final String PLAINTEXT_FOOTER_PROPERTY_NAME = 
"encryption.plaintext.footer";
+  private static final Logger LOG = 
LoggerFactory.getLogger(PropertiesDrivenCryptoFactory.class);
   
-  public static final int DEK_LENGTH = 16;
-
-  private static final SecureRandom random = new SecureRandom();
+  private static final Integer[] ACCEPTABLE_DATA_KEY_LENGTHS = {128, 192, 256};
+  private static final Set ACCEPTABLE_DATA_KEY_LENGTHS_SET =
+new HashSet<>(Arrays.asList(ACCEPTABLE_DATA_KEY_LENGTHS));

Review comment:
   Sure, we'll change this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Encryption key management tools 
> 
>
> Key: PARQUET-1373
> URL: https://issues.apache.org/jira/browse/PARQUET-1373
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>
> Parquet Modular Encryption 
> ([PARQUET-1178|https://issues.apache.org/jira/browse/PARQUET-1178]) provides 
> an API that accepts keys, arbitrary key metadata and key retrieval callbacks 
> - which allows to implement basically any key management policy on top of it. 
> This Jira will add tools that implement a set of best practice elements for 
> key management. This is not an end-to-end key management, but rather a set of 
> components that might simplify design and development of an end-to-end 
> solution.
> This tool set is one of many possible. There is no goal to create a single or 
> “standard” toolkit for Parquet encryption keys. Parquet has a Crypto Factory 
> interface [(PARQUET-1817|https://issues.apache.org/jira/browse/PARQUET-1817]) 
> that allows to plug in different implementations of encryption key management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] ggershinsky commented on a change in pull request #615: PARQUET-1373: Encryption key tools

2020-06-30 Thread GitBox


ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447508802



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*

Review comment:
   > .. With the correct annotations it can map a java object automatically.
   
   This approach seems to be optimal for objects with a fixed structure / 
fields - because it searches for all object fields in the json file. In our 
case, many fields are not always written (eg kms instance or url, if the key is 
for a column, and not for the footer; and other examples). Searching for them 
always is an overhead. Moreover, in case of internal storage, we don't need to 
parse two objects - key metadata and key material, because they are the same, 
so parsing one object is sufficient. The code we have today, performs only the 
search/parse of the relevant objects/fields, so it is optimal in that sense - 
and is also well-defined in one place. We can add more comments to the code to 
make the field parsing logic crystal clear. What do you think?
   
   > 
   > the format of these json objects is important for compatibility. We shall 
specify them or at least give an example in the comments.
   
   Sounds good. In addition to the comments mentioned above (that will be added 
to the relevant code lines), we will add a class comment to each relevant 
class, that documents the structure of the corresponding json.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-06-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148442#comment-17148442
 ] 

ASF GitHub Bot commented on PARQUET-1373:
-

ggershinsky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447508802



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*

Review comment:
   > .. With the correct annotations it can map a java object automatically.
   
   This approach seems to be optimal for objects with a fixed structure / 
fields - because it searches for all object fields in the json file. In our 
case, many fields are not always written (eg kms instance or url, if the key is 
for a column, and not for the footer; and other examples). Searching for them 
always is an overhead. Moreover, in case of internal storage, we don't need to 
parse two objects - key metadata and key material, because they are the same, 
so parsing one object is sufficient. The code we have today, performs only the 
search/parse of the relevant objects/fields, so it is optimal in that sense - 
and is also well-defined in one place. We can add more comments to the code to 
make the field parsing logic crystal clear. What do you think?
   
   > 
   > the format of these json objects is important for compatibility. We shall 
specify them or at least give an example in the comments.
   
   Sounds good. In addition to the comments mentioned above (that will be added 
to the relevant code lines), we will add a class comment to each relevant 
class, that documents the structure of the corresponding json.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Encryption key management tools 
> 
>
> Key: PARQUET-1373
> URL: https://issues.apache.org/jira/browse/PARQUET-1373
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>
> Parquet Modular Encryption 
> ([PARQUET-1178|https://issues.apache.org/jira/browse/PARQUET-1178]) provides 
> an API that accepts keys, arbitrary key metadata and key retrieval callbacks 
> - which allows to implement basically any key management policy on top of it. 
> This Jira will add tools that implement a set of best practice elements for 
> key management. This is not an end-to-end key management, but rather a set of 
> components that might simplify design and development of an end-to-end 
> solution.
> This tool set is one of many possible. There is no goal to create a single or 
> “standard” toolkit for Parquet encryption keys. Parquet has a Crypto Factory 
> interface [(PARQUET-1817|https://issues.apache.org/jira/browse/PARQUET-1817]) 
> that allows to plug in different implementations of encryption key management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [parquet-mr] gszadovszky merged pull request #799: Parquet-1872: Add TransCompression command to parquet-tools - Add the…

2020-06-30 Thread GitBox


gszadovszky merged pull request #799:
URL: https://github.com/apache/parquet-mr/pull/799


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #615: PARQUET-1373: Encryption key tools

2020-06-30 Thread GitBox


gszadovszky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447561609



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto.keytools;
+
+import java.io.IOException;
+import java.io.StringReader;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
+import org.codehaus.jackson.map.ObjectMapper;
+import org.codehaus.jackson.type.TypeReference;
+
+public class KeyMaterial {
+  static final String KEY_MATERIAL_TYPE_FIELD = "keyMaterialType";
+  static final String KEY_MATERIAL_TYPE = "PKMT1";
+  static final String KEY_MATERIAL_INTERNAL_STORAGE_FIELD = "internalStorage";
+
+  static final String FOOTER_KEY_ID_IN_FILE = "footerKey";
+  static final String COLUMN_KEY_ID_IN_FILE_PREFIX = "columnKey";
+  
+  private static final String IS_FOOTER_KEY_FIELD = "isFooterKey";
+  private static final String DOUBLE_WRAPPING_FIELD = "doubleWrapping";
+  private static final String KMS_INSTANCE_ID_FIELD = "kmsInstanceID";
+  private static final String KMS_INSTANCE_URL_FIELD = "kmsInstanceURL";
+  private static final String MASTER_KEY_ID_FIELD = "masterKeyID";
+  private static final String WRAPPED_DEK_FIELD = "wrappedDEK";
+  private static final String KEK_ID_FIELD = "keyEncryptionKeyID";
+  private static final String WRAPPED_KEK_FIELD = "wrappedKEK";
+
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+
+  private final boolean isFooterKey;
+  private final String kmsInstanceID;
+  private final String kmsInstanceURL;
+  private final String masterKeyID;
+  private final boolean isDoubleWrapped;
+  private final String kekID;
+  private final String encodedWrappedKEK;
+  private final String encodedWrappedDEK;
+
+  private KeyMaterial(boolean isFooterKey, String kmsInstanceID, String 
kmsInstanceURL, String masterKeyID, 
+  boolean isDoubleWrapped, String kekID, String encodedWrappedKEK, String 
encodedWrappedDEK) {
+this.isFooterKey = isFooterKey;
+this.kmsInstanceID = kmsInstanceID;
+this.kmsInstanceURL = kmsInstanceURL;
+this.masterKeyID = masterKeyID;
+this.isDoubleWrapped = isDoubleWrapped;
+this.kekID = kekID;
+this.encodedWrappedKEK = encodedWrappedKEK;
+this.encodedWrappedDEK = encodedWrappedDEK;
+  }
+
+  static KeyMaterial parse(Map keyMaterialJson) {
+boolean isFooterKey = 
Boolean.valueOf(keyMaterialJson.get(IS_FOOTER_KEY_FIELD));
+String kmsInstanceID = null;
+String kmsInstanceURL = null;
+if (isFooterKey) {
+  kmsInstanceID = keyMaterialJson.get(KMS_INSTANCE_ID_FIELD);
+  kmsInstanceURL = keyMaterialJson.get(KMS_INSTANCE_URL_FIELD);
+}
+boolean isDoubleWrapped = 
Boolean.valueOf(keyMaterialJson.get(DOUBLE_WRAPPING_FIELD));
+String masterKeyID = keyMaterialJson.get(MASTER_KEY_ID_FIELD);
+String  encodedWrappedDEK = keyMaterialJson.get(WRAPPED_DEK_FIELD);
+String kekID = null;
+String encodedWrappedKEK = null;
+if (isDoubleWrapped) {
+  kekID = keyMaterialJson.get(KEK_ID_FIELD);
+  encodedWrappedKEK = keyMaterialJson.get(WRAPPED_KEK_FIELD);
+}
+
+return new KeyMaterial(isFooterKey, kmsInstanceID, kmsInstanceURL, 
masterKeyID, isDoubleWrapped, kekID, encodedWrappedKEK, encodedWrappedDEK);
+  }
+
+  static KeyMaterial parse(String keyMaterialString) {
+Map keyMaterialJson = null;
+try {
+  keyMaterialJson = OBJECT_MAPPER.readValue(new 
StringReader(keyMaterialString),
+  new TypeReference>() {});
+} catch (IOException e) {
+  throw new ParquetCryptoRuntimeException("Failed to parse key metadata " 
+ keyMaterialString, e);
+}
+String keyMaterialType = keyMaterialJson.get(KEY_MATERIAL_TYPE_FIELD);
+if (!KEY_MATERIAL_TYPE.equals(keyMaterialType)) {
+  throw new ParquetCryptoRuntimeException("Wrong key material type: " + 
keyMaterialType + 
+  " vs " + KEY_MATERIAL_TYPE);
+}
+return parse(keyMaterialJson);
+  }
+
+  static String createSerialized(boolean isFooterKey, String kms

[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-06-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148485#comment-17148485
 ] 

ASF GitHub Bot commented on PARQUET-1373:
-

gszadovszky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447561609



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.crypto.keytools;
+
+import java.io.IOException;
+import java.io.StringReader;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
+import org.codehaus.jackson.map.ObjectMapper;
+import org.codehaus.jackson.type.TypeReference;
+
+public class KeyMaterial {
+  static final String KEY_MATERIAL_TYPE_FIELD = "keyMaterialType";
+  static final String KEY_MATERIAL_TYPE = "PKMT1";
+  static final String KEY_MATERIAL_INTERNAL_STORAGE_FIELD = "internalStorage";
+
+  static final String FOOTER_KEY_ID_IN_FILE = "footerKey";
+  static final String COLUMN_KEY_ID_IN_FILE_PREFIX = "columnKey";
+  
+  private static final String IS_FOOTER_KEY_FIELD = "isFooterKey";
+  private static final String DOUBLE_WRAPPING_FIELD = "doubleWrapping";
+  private static final String KMS_INSTANCE_ID_FIELD = "kmsInstanceID";
+  private static final String KMS_INSTANCE_URL_FIELD = "kmsInstanceURL";
+  private static final String MASTER_KEY_ID_FIELD = "masterKeyID";
+  private static final String WRAPPED_DEK_FIELD = "wrappedDEK";
+  private static final String KEK_ID_FIELD = "keyEncryptionKeyID";
+  private static final String WRAPPED_KEK_FIELD = "wrappedKEK";
+
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+
+  private final boolean isFooterKey;
+  private final String kmsInstanceID;
+  private final String kmsInstanceURL;
+  private final String masterKeyID;
+  private final boolean isDoubleWrapped;
+  private final String kekID;
+  private final String encodedWrappedKEK;
+  private final String encodedWrappedDEK;
+
+  private KeyMaterial(boolean isFooterKey, String kmsInstanceID, String 
kmsInstanceURL, String masterKeyID, 
+  boolean isDoubleWrapped, String kekID, String encodedWrappedKEK, String 
encodedWrappedDEK) {
+this.isFooterKey = isFooterKey;
+this.kmsInstanceID = kmsInstanceID;
+this.kmsInstanceURL = kmsInstanceURL;
+this.masterKeyID = masterKeyID;
+this.isDoubleWrapped = isDoubleWrapped;
+this.kekID = kekID;
+this.encodedWrappedKEK = encodedWrappedKEK;
+this.encodedWrappedDEK = encodedWrappedDEK;
+  }
+
+  static KeyMaterial parse(Map keyMaterialJson) {
+boolean isFooterKey = 
Boolean.valueOf(keyMaterialJson.get(IS_FOOTER_KEY_FIELD));
+String kmsInstanceID = null;
+String kmsInstanceURL = null;
+if (isFooterKey) {
+  kmsInstanceID = keyMaterialJson.get(KMS_INSTANCE_ID_FIELD);
+  kmsInstanceURL = keyMaterialJson.get(KMS_INSTANCE_URL_FIELD);
+}
+boolean isDoubleWrapped = 
Boolean.valueOf(keyMaterialJson.get(DOUBLE_WRAPPING_FIELD));
+String masterKeyID = keyMaterialJson.get(MASTER_KEY_ID_FIELD);
+String  encodedWrappedDEK = keyMaterialJson.get(WRAPPED_DEK_FIELD);
+String kekID = null;
+String encodedWrappedKEK = null;
+if (isDoubleWrapped) {
+  kekID = keyMaterialJson.get(KEK_ID_FIELD);
+  encodedWrappedKEK = keyMaterialJson.get(WRAPPED_KEK_FIELD);
+}
+
+return new KeyMaterial(isFooterKey, kmsInstanceID, kmsInstanceURL, 
masterKeyID, isDoubleWrapped, kekID, encodedWrappedKEK, encodedWrappedDEK);
+  }
+
+  static KeyMaterial parse(String keyMaterialString) {
+Map keyMaterialJson = null;
+try {
+  keyMaterialJson = OBJECT_MAPPER.readValue(new 
StringReader(keyMaterialString),
+  new TypeReference>() {});
+} catch (IOException e) {
+  throw new ParquetCryptoRuntimeException("Failed to parse key metadata " 
+ keyMaterialString, e);
+}
+String keyMaterialType = keyMaterialJson.get(KEY_MATERIAL_TYPE_FIELD);
+if (!KEY_MATERIAL_TYPE.equals(keyMaterial

[GitHub] [parquet-mr] gszadovszky commented on a change in pull request #615: PARQUET-1373: Encryption key tools

2020-06-30 Thread GitBox


gszadovszky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447562824



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*

Review comment:
   In this case I am fine with the current solution. The main reason why I 
suggested this is to increase the readability. The documentations of the json 
formats will solve this issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PARQUET-1373) Encryption key management tools

2020-06-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148488#comment-17148488
 ] 

ASF GitHub Bot commented on PARQUET-1373:
-

gszadovszky commented on a change in pull request #615:
URL: https://github.com/apache/parquet-mr/pull/615#discussion_r447562824



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KeyMaterial.java
##
@@ -0,0 +1,166 @@
+/*

Review comment:
   In this case I am fine with the current solution. The main reason why I 
suggested this is to increase the readability. The documentations of the json 
formats will solve this issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Encryption key management tools 
> 
>
> Key: PARQUET-1373
> URL: https://issues.apache.org/jira/browse/PARQUET-1373
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>
> Parquet Modular Encryption 
> ([PARQUET-1178|https://issues.apache.org/jira/browse/PARQUET-1178]) provides 
> an API that accepts keys, arbitrary key metadata and key retrieval callbacks 
> - which allows to implement basically any key management policy on top of it. 
> This Jira will add tools that implement a set of best practice elements for 
> key management. This is not an end-to-end key management, but rather a set of 
> components that might simplify design and development of an end-to-end 
> solution.
> This tool set is one of many possible. There is no goal to create a single or 
> “standard” toolkit for Parquet encryption keys. Parquet has a Crypto Factory 
> interface [(PARQUET-1817|https://issues.apache.org/jira/browse/PARQUET-1817]) 
> that allows to plug in different implementations of encryption key management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-06-30 Thread Uwe L. Korn
I'm also in favor of disabling support for now. Having to deal with broken 
files or the detection of various incompatible implementations in the long-term 
will harm more than not supporting LZ4 for a while. Snappy is generally more 
used than LZ4 in this category as it has been available since the inception of 
Parquet and thus should be considered as a viable alternative.

Cheers
Uwe

On Mon, Jun 29, 2020, at 11:48 PM, Wes McKinney wrote:
> On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou  wrote:
> >
> >
> > Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> > > hi folks,
> > >
> > > (cross-posting to dev@arrow and dev@parquet since there are
> > > stakeholders in both places)
> > >
> > > It seems there are still problems at least with the C++ implementation
> > > of LZ4 compression in Parquet files
> > >
> > > https://issues.apache.org/jira/browse/PARQUET-1241
> > > https://issues.apache.org/jira/browse/PARQUET-1878
> >
> > I don't have any particular opinion on how to solve the LZ4 issue, but
> > I'd like to mention that LZ4 and ZStandard are the two most efficient
> > compression algorithms available, and they span different parts of the
> > speed/compression spectrum, so it would be a pity to disable one of them.
> 
> It's true, however I think it's worse to write LZ4-compressed files
> that cannot be read by other Parquet implementations (if that's what's
> happening as I understand it?). If we are indeed shipping something
> broken then we either should fix it or disable it until it can be
> fixed.
> 
> > Regards
> >
> > Antoine.
>