sweisdb opened a new pull request, #40969:
URL: https://github.com/apache/spark/pull/40969

   ### What changes were proposed in this pull request?
   
   The current implementation of AES-CBC mode called via `aes_encrypt` and 
`aes_decrypt` uses a key derivation function (KDF) based on OpenSSL's 
[EVP_BytesToKey](https://www.openssl.org/docs/man3.0/man3/EVP_BytesToKey.html). 
This is intended for generating keys based on passwords and OpenSSL's documents 
discourage its use: "Newer applications should use a more modern algorithm".
   
   `aes_encrypt` and `aes_decrypt` should use the key directly in CBC mode, as 
it does for both GCM and ECB mode. The output should then be the initialization 
vector (IV) prepended to the ciphertext – as is done with GCM mode:
   `[16-byte randomly generated IV | AES-CBC encrypted ciphertext]`
   
   ### Why are the changes needed?
   
   We want to have the ciphertext output similar across different modes. 
OpenSSL's EVP_BytesToKey is effectively deprecated and their own documentation 
says not to use it. Instead, CBC mode will generate a random vector.
   
   ### Does this PR introduce _any_ user-facing change?
   
   AES-CBC output generated by the previous format will be incompatible with 
this change. That change was recently landed and we want to land this before 
CBC mode is used in practice.
   
   
   ### How was this patch tested?
   
   A new unit test in `DataFrameFunctionsSuite` was added to test both GCM and 
CBC modes. Also, a new standalone unit test suite was added in 
`ExpressionImplUtilsSuite` to test all the modes and various key lengths.
   ```
   build/sbt "sql/test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
   build/sbt "sql/test:testOnly 
org.apache.spark.sql.catalyst.expressions.ExpressionImplUtilsSuite"
   ```
   
   CBC values can be verified with `openssl enc` using the following command:
   ```
   echo -n "[INPUT]" | openssl enc -a -e -aes-256-cbc -iv [HEX IV] -K [HEX KEY]
   echo -n "Spark" | openssl enc -a -e -aes-256-cbc -iv 
f8c832cc9c61bac6151960a58e4edf86 -K 
6162636465666768696a6b6c6d6e6f7031323334353637384142434445464748
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to