Github user winningsix commented on the pull request:
https://github.com/apache/spark/pull/8880#issuecomment-186069349
Hi @vanzin, I have a suggestion about the usage of configuration.
Currently we convert the Spark crypto related configuration to Chimera
configuration by their prefix (âspark.shuffle.crypto.â). We can take a
close look at the configuration about shuffle file encryption. We could
classify them into two catalogs: 1) common configurations including
âspark.shuffle.encryption.enabledâ,
âspark.shuffle.encryption.keySizeBitsâ,
âspark.shuffle.encryption.keygen.algorithmâ and
âspark.shuffle.crypto.cipher.transformationâ; 2) implementations (Chimera
or Apache common crypto later) related configurations including
âspark.shuffle.crypto.cipher.classesâ and
âspark.shuffle.crypto.secure.random.classesâ. I prefer not to expose
configurations in the second catalog to users. At the same time, we need to
change the method ```toChimeraConf``` in class ```CryptoStreamUtils```. In
that method, we will filter all properties related to the library to crypto
library (Chimera or Apache common crypto later). Then we donât need Spark con
f prefix and use library prefix only. We could obtain the following benefits.
1. Now crypto related configuration name is following the conversion
[Spark_prefix]+[Chimera property key exclude Chimera prefix]. With this change,
Spark will get rid of the restrictions on the configuration name for crypto
related library and they still configure the library using the original library
configuration.
2. After Chimera is governed by Apache common, we can easily change the
prefix and donât require other changes.
3. Itâs good for backward compatible. For example, user configured
property âspark.shuffle.crypto.secure.random.classesâ with the value of
âcom.intel.chimera.classnameâ. If the time Chimera is accepted by Apache is
after what Spark released. If user upgrades, the value has to be updated as
well.
Changes will include:
1. Remove configurations âspark.shuffle.crypto.cipher.classesâ and
âspark.shuffle.crypto.secure.random.classesâ from Spark side
2. Remove the variable ```SPARK_CHIMERA_CONF_PREFIX``` in
```CryptoStreamUtils```
3. Change the implementation of ```toChimeraConf``` in
```CryptoStreamUtils``` like follows:
```
def toCryptoConf(conf: SparkConf, cryptoLibPrefix: String): Properties = {
val props = new Properties()
conf.getAll.foreach { case (k, v) =>
if (k.startsWith(cryptoLibPrefix)) {
props.put(k, v)
}
}
props
}
```
Any thoughts about this change?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]