[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

winningsix Thu, 18 Feb 2016 21:54:06 -0800

Github user winningsix commented on the pull request:

    https://github.com/apache/spark/pull/8880#issuecomment-186069349
  
    Hi @vanzin, I have a suggestion about the usage of configuration. 
    Currently we convert the Spark crypto related configuration to Chimera 
configuration by their prefix (âspark.shuffle.crypto.â).  We can take a 
close look at the configuration about shuffle file encryption.  We could 
classify them into two catalogs: 1) common configurations including 
âspark.shuffle.encryption.enabledâ, 
âspark.shuffle.encryption.keySizeBitsâ, 
âspark.shuffle.encryption.keygen.algorithmâ and 
âspark.shuffle.crypto.cipher.transformationâ; 2) implementations (Chimera 
or Apache common crypto later) related configurations including 
âspark.shuffle.crypto.cipher.classesâ and 
âspark.shuffle.crypto.secure.random.classesâ.  I prefer not to expose 
configurations in the second catalog to users.  At the same time, we need to 
change the method ```toChimeraConf``` in class ```CryptoStreamUtils```.  In 
that method, we will filter all properties related to the library to crypto 
library (Chimera or Apache common crypto later). Then we donât need Spark con
 f prefix and use library prefix only. We could obtain the following benefits.
     1. Now crypto related configuration name is following the conversion 
[Spark_prefix]+[Chimera property key exclude Chimera prefix]. With this change, 
Spark will get rid of the restrictions on the configuration name for crypto 
related library and they still configure the library using the original library 
configuration.
     2. After Chimera is governed by Apache common, we can easily change the 
prefix and donât require other changes.
     3. Itâs good for backward compatible. For example, user configured 
property âspark.shuffle.crypto.secure.random.classesâ with the value of 
âcom.intel.chimera.classnameâ. If the time Chimera is accepted by Apache is 
after what Spark released. If user upgrades, the value has to be updated as 
well.
    
    Changes will include:
     1. Remove configurations âspark.shuffle.crypto.cipher.classesâ and 
âspark.shuffle.crypto.secure.random.classesâ from Spark side
     2. Remove the variable ```SPARK_CHIMERA_CONF_PREFIX``` in 
```CryptoStreamUtils```
     3. Change the implementation of ```toChimeraConf``` in 
```CryptoStreamUtils``` like follows:
    ```
      def toCryptoConf(conf: SparkConf, cryptoLibPrefix: String): Properties = {
        val props = new Properties()
        conf.getAll.foreach { case (k, v) =>
          if (k.startsWith(cryptoLibPrefix)) {
            props.put(k, v)
          }
        }
        props
      }
    ```
    Any thoughts about this change?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5682][Core] Add encrypted shuffle in sp...

Reply via email to