Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16521#discussion_r97184574
  
    --- Diff: 
common/network-common/src/main/java/org/apache/spark/network/crypto/README.md 
---
    @@ -0,0 +1,158 @@
    +Spark Auth Protocol and AES Encryption Support
    +==============================================
    +
    +This file describes an auth protocol used by Spark as a more secure 
alternative to DIGEST-MD5. This
    +protocol is built on symmetric key encryption, based on the assumption 
that the two endpoints being
    +authenticated share a common secret, which is how Spark authentication 
currently works. The protocol
    +provides mutual authentication, meaning that after the negotiation both 
parties know that the remote
    +side knows the shared secret. The protocol is influenced by the ISO/IEC 
9798 protocol, although it's
    +not an implementation of it.
    +
    +This protocol could be replaced with TLS PSK, except no PSK ciphers are 
available in the currently
    +released JREs.
    +
    +The protocol aims at solving the following shortcomings in Spark's current 
usage of DIGEST-MD5:
    +
    +- MD5 is an aging hash algorithm with known weaknesses, and a more secure 
alternative is desired.
    +- DIGEST-MD5 has a pre-defined set of ciphers for which it can generate 
keys. The only
    +  viable, supported cipher these days is 3DES, and a more modern 
alternative is desired.
    +- Encrypting AES session keys with 3DES doesn't solve the issue, since the 
weakest link
    +  in the negotiation would still be MD5 and 3DES.
    +
    +The protocol assumes that the shared secret is generated and distributed 
in a secure manner.
    +
    +The protocol always negotiates encryption keys. If encryption is not 
desired, the existing
    +SASL-based authentication, or no authentication at all, can be chosen 
instead.
    +
    +When messages are described below, it's expected that the implementation 
should support
    +arbitrary sizes for fields that don't have a fixed size.
    +
    +Client Challenge
    +----------------
    +
    +The auth negotiation is started by the client. The client starts by 
generating an encryption
    +key based on the application's shared secret, and a nonce.
    +
    +    KEY = KDF(SECRET, SALT, KEY_LENGTH)
    +
    +Where:
    +- KDF(): a key derivation function that takes a secret, a salt, a 
configurable number of
    +  iterations, and a configurable key length.
    +- SALT: a byte sequence used to salt the key derivation function.
    +- KEY_LENGTH: length of the encryption key to generate.
    +
    +
    +The client generates a message with the following content:
    +
    +    CLIENT_CHALLENGE = (
    +        APP_ID,
    +        KDF,
    +        ITERATIONS,
    +        CIPHER,
    +        KEY_LENGTH,
    +        ANONCE,
    +        ENC(APP_ID || ANONCE || CHALLENGE))
    +
    +Where:
    +
    +- APP_ID: the application ID which the server uses to identify the shared 
secret.
    +- KDF: the key derivation function described above.
    +- ITERATIONS: number of iterations to run the KDF when generating keys.
    +- CIPHER: the cipher used to encrypt data.
    +- KEY_LENGTH: length of the encryption keys to generate, in bits.
    +- ANONCE: the nonce used as the salt when generating the auth key.
    +- ENC(): an encryption function that uses the cipher and the generated 
key. This function
    +  will also be used in the definition of other messages below.
    +- CHALLENGE: a byte sequence used as a challenge to the server.
    +- ||: concatenation operator.
    +
    +When strings are used where byte arrays are expected, the UTF-8 
representation of the string
    +is assumed.
    +
    +To respond to the challenge, the server should consider the byte array as 
representing an
    +arbitrary-length integer, and respond with the value of the integer plus 
one.
    +
    +
    +Server Response And Challenge
    +-----------------------------
    +
    +Once the client challenge is received, the server will generate the same 
auth key by
    +using the same algorithm the client has used. It will then verify the 
client challenge:
    +if the APP_ID and ANONCE fields match, the server knows that the client 
has the shared
    +secret. The server then creates a response to the client challenge, to 
prove that it also
    +has the secret key, and provides parameters to be used when creating the 
session key.
    +
    +The following describes the response from the server:
    +
    +    SERVER_CHALLENGE = (
    +        ENC(APP_ID || ANONCE || RESPONSE),
    +        ENC(SNONCE),
    +        ENC(INIV),
    +        ENC(OUTIV))
    +
    +Where:
    +
    +- RESPONSE: the server's response to the client challenge.
    +- SNONCE: a nonce to be used as salt when generating the session key.
    +- INIV: initialization vector used to initialize the input channel of the 
client.
    +- OUTIV: initialization vector used to initialize the output channel of 
the client.
    +
    +At this point the server considers the client to be authenticated, and 
will try to
    +decrypt any data further sent by the client using the session key.
    +
    +
    +Default Algorithms
    +------------------
    +
    +Configuration options are available for the KDF and cipher algorithms to 
use.
    +
    +The default KDF is "PBKDF2WithHmacSHA1". Users should be able to select 
any algorithm
    +from those supported by the `javax.crypto.SecretKeyFactory` class, as long 
as they support
    +PBEKeySpec when generating keys. The default number of iterations was 
chosen to take a
    +reasonable amount of time on modern CPUs. See the documentation in 
TransportConf for more
    +details.
    +
    +The default cipher algorithm is "AES/CTR/NoPadding". Users should be able 
to select any
    +algorithm supported by the commons-crypto library. It should allow the 
cipher to operate
    +in stream mode.
    +
    +The default key length is 128 (bits).
    +
    +
    +Implementation Details
    +----------------------
    +
    +The commons-crypto library currently only supports AES ciphers, and 
requires an initialization
    +vector (IV). This first version of the protocol does not explicitly 
include the IV in the client
    +challenge message. Instead, the IV should be derived from the nonce, 
including the needed bytes, and
    +padding the IV with zeroes in case the nonce is not long enough.
    +
    +Future versions of the protocol might add support for new ciphers and 
explicitly include needed
    +configuration parameters in the messages.
    +
    +
    +Threat Assessment
    +-----------------
    +
    +The protocol is secure against different forms of attack:
    +
    +* Eavesdropping: the protocol is built on the assumption that it's 
computationally infeasible
    +  to calculate the original secret from the encrypted messages. Neither 
the secret nor any
    +  encryption keys are transmitted on the wire, encrypted or not.
    +
    +* Man-in-the-middle: because the protocol performs mutual authentication, 
both ends need to
    +  know the shared secret to be able to decrypt session data. Even if an 
attacker is able to insert a
    +  malicious "proxy" between endpoints, the attacker won't be able to read 
any of the data exchanged
    +  between client and server, nor insert arbitrary commands for the server 
to execute.
    +
    +* Replay attacks: the use of nonces when generating keys prevents an 
attacker from being able to
    --- End diff --
    
    The server doesn't verify a nonce was used or not, so it don't prevents 
replay attacks. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to