[GitHub] [spark] srowen commented on a change in pull request #32895: [SPARK-35658][DOCS] Document Parquet encryption feature in Spark SQL

GitBox Wed, 14 Jul 2021 07:08:14 -0700


srowen commented on a change in pull request #32895:
URL: https://github.com/apache/spark/pull/32895#discussion_r669649591




##########
File path: docs/sql-data-sources-parquet.md
##########
@@ -252,6 +252,71 @@ REFRESH TABLE my_table;
 
 </div>
 
+## Columnar Encryption
+
+
+Since Spark 3.2, columnar encryption is supported for Parquet tables with 
Apache Parquet 1.12+.
+
+Parquet uses the envelope encryption practice, where file parts are encrypted 
with "data encryption keys" (DEKs), and the DEKs are encrypted with "master 
encryption keys" (MEKs). The DEKs are randomly generated by Parquet for each 
encrypted file/column. The MEKs are generated, stored and managed in a Key 
Management Service (KMS) of user’s choice. Parquet maven [repository]( 
https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.12.0/) has a 
jar with a mock KMS implementation that allows to run column encryption and 
decryption using a spark-shell only, without deploying a KMS server (download 
the `parquet-hadoop-tests.jar` file and place it in the Spark `jars` folder):

Review comment:
       I know it's not in Spark, but, why repeat 'how to include a JAR' here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #32895: [SPARK-35658][DOCS] Document Parquet encryption feature in Spark SQL

Reply via email to