hasnain-db opened a new pull request, #42685: URL: https://github.com/apache/spark/pull/42685
### What changes were proposed in this pull request? This PR adds support for SSL/TLS based communication for Spark RPCs and block transfers - providing an alternative to the existing encryption / authentication implementation documented at https://spark.apache.org/docs/latest/security.html#spark-rpc-communication-protocol-between-spark-processes This is based on an existing PR from 2015: https://github.com/apache/spark/pull/9416 - with some refactoring and a number of updates to make it work with changes to Spark since. I understand this is a large PR, so I've broken this down by a high level summary of changes and a suggested review order: * Add a dependency on `netty-tcnative-boringssl-static` which provides support for faster TLS communication * Add documentation for the new functionality in `docs/security.md` which describe the new flags and configuration. * Extend `SSLOptions` and `TransportConf` to support the new flags for this feature * Extend `TransportContext` to optionally add an SSL based handler if configured, and make similar changes in the transport client and server factories * Add a new API to `ManagedBuffer` to convert objects to a Netty object for SSL encoding (since we can't use zero-copy transfers, we can't use `convertToNetty()` directly) * Add some helper classes for communication: * `EncryptedMessageWithHeader` and `SSLMessageEncoder` are quite similar to the existing variants but just different enough that it was hard to consolidate * `SSLFactory` to create the JDK / Netty SSL handlers as appropriate. This handles configuration of the protocols, ciphers, etc * `ReloadingX509TrustManager` to support trust store reloading. * Change `SecurityManager` to disable the existing authentication/encryption mechanisms if this new feature is enabled * Update `SparkConf` and `CommandUtils` to pass passwords via env variables if needed, to preserve security guarantees (similar to the existing SSL password propagation) * Update almost all constructor callsites of `SparkTransportConf` to propagate SSL options * Add tests: * Add test keys + certificates in a bunch of places * Add an `SSLSampleConfigs` class for sample configurations used in tests * Add tests for a bunch of the new classes + features * Add new tests for almost all modules that create a `TransportContext` which rerun the same tests but with this new feature enabled (this caught a bunch of bugs). This will ensure features are compatible with SSL going forward ### Why are the changes needed? Spark currently does not support TLS/SSL for RPC and block transfers. It is helpful to have this as an alternative encryption method for users which may need to use more standard encryption mechanisms instead of one that is more internal to spark. ### Does this PR introduce _any_ user-facing change? Yes, this includes new configuration options and updates associated documentation. Besides that, though, all other aspects of Spark should remain unchanged (modulo performance differences). ### How was this patch tested? Added a bunch of unit tests that pass. I also ran some queries locally to ensure they still work. I verified traffic was encrypted using TLS using two mechanisms: * Enabled trace level logging for Netty and JDK SSL and saw logs confirming TLS handshakes were happening * I ran wireshark on my machine and snooped on traffic while sending queries shuffling a fixed string. Without any encryption, I could find that string in the network traffic. With this encryption enabled, that string did not show up, and wireshark logs confirmed a TLS handshake was happening. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
