hasnain-db opened a new pull request, #42685:
URL: https://github.com/apache/spark/pull/42685

   ### What changes were proposed in this pull request?
   
   This PR adds support for SSL/TLS based communication for Spark RPCs and 
block transfers - providing an alternative to the existing encryption / 
authentication implementation documented at 
https://spark.apache.org/docs/latest/security.html#spark-rpc-communication-protocol-between-spark-processes
   
   This is based on an existing PR from 2015: 
https://github.com/apache/spark/pull/9416 - with some refactoring and a number 
of updates to make it work with changes to Spark since.
   
   I understand this is a large PR, so I've broken this down by a high level 
summary of changes and a suggested review order:
   
   * Add a dependency on `netty-tcnative-boringssl-static` which provides 
support for faster TLS communication
   * Add documentation for the new functionality in `docs/security.md` which 
describe the new flags and configuration.
   * Extend `SSLOptions` and `TransportConf` to support the new flags for this 
feature 
   * Extend `TransportContext` to optionally add an SSL based handler if 
configured, and make similar changes in the transport client and server 
factories
   * Add a new API to `ManagedBuffer` to convert objects to a Netty object for 
SSL encoding (since we can't use zero-copy transfers, we can't use 
`convertToNetty()` directly)
   * Add some helper classes for communication:
     * `EncryptedMessageWithHeader` and `SSLMessageEncoder` are quite similar 
to the existing variants but just different enough that it was hard to 
consolidate
     * `SSLFactory` to create the JDK / Netty SSL handlers as appropriate. This 
handles configuration of the protocols, ciphers, etc
     * `ReloadingX509TrustManager` to support trust store reloading. 
   * Change `SecurityManager` to disable the existing authentication/encryption 
mechanisms if this new feature is enabled
   * Update `SparkConf` and `CommandUtils` to pass passwords via env variables 
if needed, to preserve security guarantees (similar to the existing SSL 
password propagation)
   * Update almost all constructor callsites of `SparkTransportConf` to 
propagate SSL options
   * Add tests:
     * Add test keys + certificates in a bunch of places
     * Add an `SSLSampleConfigs` class for sample configurations used in tests
     * Add tests for a bunch of the new classes + features
     * Add new tests for almost all modules that create a `TransportContext` 
which rerun the same tests but with this new feature enabled (this caught a 
bunch of bugs). This will ensure features are compatible with SSL going forward
   
   
   ### Why are the changes needed?
   
   Spark currently does not support TLS/SSL for RPC and block transfers. It is 
helpful to have this as an alternative encryption method for users which may 
need to use more standard encryption mechanisms instead of one that is more 
internal to spark.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, this includes new configuration options and updates associated 
documentation. Besides that, though, all other aspects of Spark should remain 
unchanged (modulo performance differences).
   
   
   ### How was this patch tested?
   
   Added a bunch of unit tests that pass. 
   
   I also ran some queries locally to ensure they still work.
   
   I verified traffic was encrypted using TLS using two mechanisms:
   
   * Enabled trace level logging for Netty and JDK SSL and saw logs confirming 
TLS handshakes were happening
   * I ran wireshark on my machine and snooped on traffic while sending queries 
shuffling a fixed string. Without any encryption, I could find that string in 
the network traffic. With this encryption enabled, that string did not show up, 
and wireshark logs confirmed a TLS handshake was happening.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to