Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/16521#discussion_r97184574
--- Diff:
common/network-common/src/main/java/org/apache/spark/network/crypto/README.md
---
@@ -0,0 +1,158 @@
+Spark Auth Protocol and AES Encryption Support
+==============================================
+
+This file describes an auth protocol used by Spark as a more secure
alternative to DIGEST-MD5. This
+protocol is built on symmetric key encryption, based on the assumption
that the two endpoints being
+authenticated share a common secret, which is how Spark authentication
currently works. The protocol
+provides mutual authentication, meaning that after the negotiation both
parties know that the remote
+side knows the shared secret. The protocol is influenced by the ISO/IEC
9798 protocol, although it's
+not an implementation of it.
+
+This protocol could be replaced with TLS PSK, except no PSK ciphers are
available in the currently
+released JREs.
+
+The protocol aims at solving the following shortcomings in Spark's current
usage of DIGEST-MD5:
+
+- MD5 is an aging hash algorithm with known weaknesses, and a more secure
alternative is desired.
+- DIGEST-MD5 has a pre-defined set of ciphers for which it can generate
keys. The only
+ viable, supported cipher these days is 3DES, and a more modern
alternative is desired.
+- Encrypting AES session keys with 3DES doesn't solve the issue, since the
weakest link
+ in the negotiation would still be MD5 and 3DES.
+
+The protocol assumes that the shared secret is generated and distributed
in a secure manner.
+
+The protocol always negotiates encryption keys. If encryption is not
desired, the existing
+SASL-based authentication, or no authentication at all, can be chosen
instead.
+
+When messages are described below, it's expected that the implementation
should support
+arbitrary sizes for fields that don't have a fixed size.
+
+Client Challenge
+----------------
+
+The auth negotiation is started by the client. The client starts by
generating an encryption
+key based on the application's shared secret, and a nonce.
+
+ KEY = KDF(SECRET, SALT, KEY_LENGTH)
+
+Where:
+- KDF(): a key derivation function that takes a secret, a salt, a
configurable number of
+ iterations, and a configurable key length.
+- SALT: a byte sequence used to salt the key derivation function.
+- KEY_LENGTH: length of the encryption key to generate.
+
+
+The client generates a message with the following content:
+
+ CLIENT_CHALLENGE = (
+ APP_ID,
+ KDF,
+ ITERATIONS,
+ CIPHER,
+ KEY_LENGTH,
+ ANONCE,
+ ENC(APP_ID || ANONCE || CHALLENGE))
+
+Where:
+
+- APP_ID: the application ID which the server uses to identify the shared
secret.
+- KDF: the key derivation function described above.
+- ITERATIONS: number of iterations to run the KDF when generating keys.
+- CIPHER: the cipher used to encrypt data.
+- KEY_LENGTH: length of the encryption keys to generate, in bits.
+- ANONCE: the nonce used as the salt when generating the auth key.
+- ENC(): an encryption function that uses the cipher and the generated
key. This function
+ will also be used in the definition of other messages below.
+- CHALLENGE: a byte sequence used as a challenge to the server.
+- ||: concatenation operator.
+
+When strings are used where byte arrays are expected, the UTF-8
representation of the string
+is assumed.
+
+To respond to the challenge, the server should consider the byte array as
representing an
+arbitrary-length integer, and respond with the value of the integer plus
one.
+
+
+Server Response And Challenge
+-----------------------------
+
+Once the client challenge is received, the server will generate the same
auth key by
+using the same algorithm the client has used. It will then verify the
client challenge:
+if the APP_ID and ANONCE fields match, the server knows that the client
has the shared
+secret. The server then creates a response to the client challenge, to
prove that it also
+has the secret key, and provides parameters to be used when creating the
session key.
+
+The following describes the response from the server:
+
+ SERVER_CHALLENGE = (
+ ENC(APP_ID || ANONCE || RESPONSE),
+ ENC(SNONCE),
+ ENC(INIV),
+ ENC(OUTIV))
+
+Where:
+
+- RESPONSE: the server's response to the client challenge.
+- SNONCE: a nonce to be used as salt when generating the session key.
+- INIV: initialization vector used to initialize the input channel of the
client.
+- OUTIV: initialization vector used to initialize the output channel of
the client.
+
+At this point the server considers the client to be authenticated, and
will try to
+decrypt any data further sent by the client using the session key.
+
+
+Default Algorithms
+------------------
+
+Configuration options are available for the KDF and cipher algorithms to
use.
+
+The default KDF is "PBKDF2WithHmacSHA1". Users should be able to select
any algorithm
+from those supported by the `javax.crypto.SecretKeyFactory` class, as long
as they support
+PBEKeySpec when generating keys. The default number of iterations was
chosen to take a
+reasonable amount of time on modern CPUs. See the documentation in
TransportConf for more
+details.
+
+The default cipher algorithm is "AES/CTR/NoPadding". Users should be able
to select any
+algorithm supported by the commons-crypto library. It should allow the
cipher to operate
+in stream mode.
+
+The default key length is 128 (bits).
+
+
+Implementation Details
+----------------------
+
+The commons-crypto library currently only supports AES ciphers, and
requires an initialization
+vector (IV). This first version of the protocol does not explicitly
include the IV in the client
+challenge message. Instead, the IV should be derived from the nonce,
including the needed bytes, and
+padding the IV with zeroes in case the nonce is not long enough.
+
+Future versions of the protocol might add support for new ciphers and
explicitly include needed
+configuration parameters in the messages.
+
+
+Threat Assessment
+-----------------
+
+The protocol is secure against different forms of attack:
+
+* Eavesdropping: the protocol is built on the assumption that it's
computationally infeasible
+ to calculate the original secret from the encrypted messages. Neither
the secret nor any
+ encryption keys are transmitted on the wire, encrypted or not.
+
+* Man-in-the-middle: because the protocol performs mutual authentication,
both ends need to
+ know the shared secret to be able to decrypt session data. Even if an
attacker is able to insert a
+ malicious "proxy" between endpoints, the attacker won't be able to read
any of the data exchanged
+ between client and server, nor insert arbitrary commands for the server
to execute.
+
+* Replay attacks: the use of nonces when generating keys prevents an
attacker from being able to
--- End diff --
The server doesn't verify a nonce was used or not, so it don't prevents
replay attacks. Right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]