ggershinsky commented on a change in pull request #32895: URL: https://github.com/apache/spark/pull/32895#discussion_r650903209
########## File path: docs/sql-data-sources-parquet.md ########## @@ -252,6 +252,51 @@ REFRESH TABLE my_table; </div> +## Columnar Encryption + + +Since Spark 3.2, columnar encryption is supported for Parquet tables with Apache Parquet 1.12. + +Parquet uses the envelope encryption practice, where the file parts are encrypted with “data encryption keys” (DEKs), and the DEKs are encrypted with “master encryption keys” (MEKs). The DEKs are randomly generated by Parquet for each encrypted file/column. The MEKs are generated, stored and managed in a Key Management Service (KMS) of user’s choice. Parquet-test [package](https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.12.0/parquet-hadoop-1.12.0-tests.jar) has a mock KMS implementation that allows to run column encryption and decryption without a KMS server: + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> +{% highlight scala %} + +sc.hadoopConfiguration.set("parquet.encryption.kms.client.class" , + "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS") Review comment: As I mentioned above, we don't support any particular KMS. The advantage of the mock InMemoryKMS class is that it provides an easy to understand demo code, and can be tried without any KMS server. But I'd agree we must stress this is not a real KMS and should never be used in production. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
