mridulm commented on code in PR #45425:
URL: https://github.com/apache/spark/pull/45425#discussion_r1548815203
##########
common/network-common/src/main/java/org/apache/spark/network/crypto/README.md:
##########
@@ -99,3 +103,13 @@ sessions. It would, however, allow impersonation of future
sessions.
In the event of a pre-shared key compromise, messages would still be
confidential from a passive
observer. Only active adversaries spoofing a session would be able to recover
plaintext.
+Security Changes & Compatibility
+-------------
+
+The original version of this protocol, retroactively called v1.0, did not
apply an HKDF to `sharedSecret` and was
Review Comment:
nit:
```suggestion
The original version of this protocol, retroactively called v1.0, did not
apply an HKDF to `sharedSecret` (to generate the `derivedKey`) and was
```
##########
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java:
##########
@@ -213,6 +213,11 @@ public boolean encryptionEnabled() {
return conf.getBoolean("spark.network.crypto.enabled", false);
}
+ /**
+ * Version number to be used by the AuthEngine key agreement protocol. Value
values are 1 or 2.
+ */
+ public int authEngineVersion() { return
conf.getInt("spark.network.crypto.authEngineVersion", 2); }
Review Comment:
Make the default 1, to preserve the existing behavior, and allowing backport
to patch versions.
We can document to set this explicitly to to `2` to mitigate this issue.
In a later minor version, we can flip this to `2` by default.
##########
common/network-common/src/main/java/org/apache/spark/network/crypto/AuthEngine.java:
##########
@@ -224,7 +236,7 @@ private TransportCipher generateTransportCipher(
private byte[] getTranscript(AuthMessage... encryptedPublicKeys) {
ByteBuf transcript = Unpooled.buffer(
- Arrays.stream(encryptedPublicKeys).mapToInt(k ->
k.encodedLength()).sum());
+
Arrays.stream(encryptedPublicKeys).mapToInt(AuthMessage::encodedLength).sum());
Review Comment:
Let us minimize optional changes in this PR (this by itself is fine - but in
general, we should reduce the scope of change), given we want to backport to
older versions and want to minimize risk.
##########
docs/security.md:
##########
@@ -149,24 +149,32 @@ secret file agrees with the executors' secret file.
# Network Encryption
-Spark supports two mutually exclusive forms of encryption for RPC connections.
+Spark supports two mutually exclusive forms of encryption for RPC connections:
-The first is an AES-based encryption which relies on a shared secret, and thus
requires
-RPC authentication to also be enabled.
+The **preferred method** uses TLS (aka SSL) encryption via Netty's support for
SSL. Enabling SSL
+requires keys and certificates to be properly configured. SSL is standardized
and considered more
+secure.
-The second is an SSL based encryption mechanism utilizing Netty's support for
SSL. This requires
-keys and certificates to be properly configured. It can be used with or
without the authentication
-mechanism discussed earlier.
-
-One may prefer to use the SSL based encryption in scenarios where compliance
mandates the usage
-of specific protocols; or to leverage the security of a more standard
encryption library. However,
-the AES based encryption is simpler to configure and may be preferred if the
only requirement
-is that data be encrypted in transit.
+The legacy method is an AES-based encryption mechanism relying on a shared
secret. This requires
+RPC authentication to also be enabled. This method uses a bespoke protocol and
it is recommended
+to use SSL instead.
Review Comment:
```suggestion
RPC authentication to also be enabled. This method uses a bespoke protocol
with an older v1 and an updated v2 version. <add details about v1 vs v2>
```
Also, update the config section below on how to configure
`spark.network.crypto.authEngineVersion`, and a ref to that here: along with
incompatibility between `1` and `2` (and all versions prior being `1`
retroactively - without a way to configure it before `3.5.2` and `3.4.3`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]