[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-157908221 @winningsix could you update the patch so it merges and compiles? Also, the PR title and description need to be updated after the latest changes. While you're at it, please follow the PR title convention (see https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155804412 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155804445 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155805866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45640/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155805171 **[Test build #45640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45640/consoleFull)** for PR 8880 at commit [`d08af32`](https://github.com/apache/spark/commit/d08af32b3d8fe2a2c4fa5614ddfc86758b2eeeb8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155805865 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155805861 **[Test build #45640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45640/consoleFull)** for PR 8880 at commit [`d08af32`](https://github.com/apache/spark/commit/d08af32b3d8fe2a2c4fa5614ddfc86758b2eeeb8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155079340 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155079079 **[Test build #45369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45369/consoleFull)** for PR 8880 at commit [`3c810ac`](https://github.com/apache/spark/commit/3c810ace342ac77bfbecc50e1def635c6afa8151). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155077100 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155077066 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155075773 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155075706 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r44361794 --- Diff: core/src/main/scala/org/apache/spark/crypto/CipherSuite.scala --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +/** + * Defines properties of a CipherSuite. Modeled after the ciphers in [[javax.crypto.Cipher]] + * @param namename of cipher suite, as in [[javax.crypto.Cipher]] + * @param algoBlockSize size of an algorithm block in bytes + */ +private[spark] case class CipherSuite(name: String, algoBlockSize: Int) { + private var _unknownValue: Integer = _ + + def unknownValue: Integer = _unknownValue + + def unknownValue_=(unknownValue: Integer): Unit = { --- End diff -- Isn't this just the same as declaring a public `var _unknownValue: Integer = _`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155117504 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-155117279 **[Test build #45369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45369/consoleFull)** for PR 8880 at commit [`3c810ac`](https://github.com/apache/spark/commit/3c810ace342ac77bfbecc50e1def635c6afa8151). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r44362026 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables --- End diff -- They're either constants or variables... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user winningsix commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r44354373 --- Diff: core/src/test/scala/org/apache/spark/crypto/JceAesCtrCryptoCodecSuite.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{ByteArrayInputStream, BufferedOutputStream, ByteArrayOutputStream} +import java.security.SecureRandom + +import org.apache.spark.crypto.CommonConfigurationKeys._ +import org.apache.spark.{SparkFunSuite, SparkConf, Logging} + +/** + * test JceAesCtrCryptoCodec + */ +class JceAesCtrCryptoCodecSuite extends SparkFunSuite with Logging { + + test("TestCryptoCodecSuite") { +val random = new SecureRandom +val dataLen = 1000 +val inputData = new Array[Byte](dataLen) +val outputData = new Array[Byte](dataLen) +random.nextBytes(inputData) +// encrypt +val sparkConf = new SparkConf + sparkConf.set(SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY, + classOf[JceAesCtrCryptoCodec].getName) +val cipherSuite = new CipherSuite("AES/CTR/NoPadding", 16) +val codec = new JceAesCtrCryptoCodec(sparkConf) +val aos = new ByteArrayOutputStream +val bos = new BufferedOutputStream(aos) +val key = new Array[Byte](16) +val iv = new Array[Byte](16) +random.nextBytes(key) +random.nextBytes(iv) + +val cos = new CryptoOutputStream(bos, codec, 1024, key, iv) +cos.write(inputData, 0, inputData.length) --- End diff -- Tests added in the cryptoCodec level in my latest updated pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-150155027 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-150155055 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-150178787 **[Test build #44144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44144/consoleFull)** for PR 8880 at commit [`135b380`](https://github.com/apache/spark/commit/135b3809982c9b66f479a855fac944799e56f7fc). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-150178851 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-150178852 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44144/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-150156455 **[Test build #44144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44144/consoleFull)** for PR 8880 at commit [`135b380`](https://github.com/apache/spark/commit/135b3809982c9b66f479a855fac944799e56f7fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149942075 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149942029 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149945368 **[Test build #44068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44068/consoleFull)** for PR 8880 at commit [`a75931f`](https://github.com/apache/spark/commit/a75931fdc57101ac4cef5f4467f1a5ccc6947b0a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149980692 **[Test build #44068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44068/consoleFull)** for PR 8880 at commit [`a75931f`](https://github.com/apache/spark/commit/a75931fdc57101ac4cef5f4467f1a5ccc6947b0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149980939 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44068/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149980937 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149850877 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44052/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149850819 **[Test build #44052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44052/consoleFull)** for PR 8880 at commit [`de92e60`](https://github.com/apache/spark/commit/de92e60abc053ceb50064878fd93ff67c4bb2eec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149850876 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149822285 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149822365 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-149824547 **[Test build #44052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44052/consoleFull)** for PR 8880 at commit [`de92e60`](https://github.com/apache/spark/commit/de92e60abc053ceb50064878fd93ff67c4bb2eec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42050263 --- Diff: core/src/main/scala/org/apache/spark/crypto/CipherSuite.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import com.google.common.base.Objects + +/** + * Defines properties of a CipherSuite. Modeled after the ciphers in [[javax.crypto.Cipher]] + * @param namename of cipher suite, as in [[javax.crypto.Cipher]] + * @param algoBlockSize size of an algorithm block in bytes + */ +private[spark] case class CipherSuite(name: String, algoBlockSize: Int) { + private var _unknownValue: Integer = _ + + def unknownValue: Integer = _unknownValue + + def unknownValue_=(unknownValue: Integer): Unit = { +_unknownValue = unknownValue + } + + override def toString(): String = { +Objects.toStringHelper(this).add("name", name).add("algoBlockSize", algoBlockSize).toString --- End diff -- This is a case class so I'm not sure you even need to override `toString`. (Unless you really want the property names there.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42051744 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = +SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix() + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY = "spark.security.java.secure.random" + + ".algorithm" + val SPARK_SECURITY_CRYPTO_JCE_PROVIDER_KEY = "spark.security.crypto.jce.provider" + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT = "SHA1PRNG" + val SPARK_SECURITY_SECURE_RANDOM_IMPL_KEY = "spark.security.secure.random.impl" --- End diff -- Not used anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42051693 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = +SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix() + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY = "spark.security.java.secure.random" + + ".algorithm" + val SPARK_SECURITY_CRYPTO_JCE_PROVIDER_KEY = "spark.security.crypto.jce.provider" + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT = "SHA1PRNG" + val SPARK_SECURITY_SECURE_RANDOM_IMPL_KEY = "spark.security.secure.random.impl" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "spark.job.encryptedIntermediateData.buffer.kb" + val DEFAULT_SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "128" + val SPARK_SECURITY_SECURE_RANDOM_DEVICE_FILE_PATH_KEY = "spark.security.random.device.file.path" + val SPARK_SECURITY_SECURE_RANDOM_DEVICE_FILE_PATH_DEFAULT = "/dev/urandom" --- End diff -- This is not only not used anywhere, but probably not cross-platform. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42057147 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42057527 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42057655 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42059715 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoOutputStream.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, FilterOutputStream, OutputStream} +import java.nio.ByteBuffer +import java.security.GeneralSecurityException + +import com.google.common.base.Preconditions + +import org.apache.spark.Logging + +/** + * CryptoOutputStream encrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The encryption is buffer based. The key points of the encryption are + * (1) calculating counter and (2) padding through stream position. + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoOutputStream( +out: OutputStream, +private[spark] val codec: CryptoCodec, +bufferSizeVal: Int, +keyVal: Array[Byte], +ivVal: Array[Byte], +streamOffsetVal: Long, +isDirectBuf: Boolean) extends FilterOutputStream(out: OutputStream) with Logging { + var encryptor: Encryptor = null + var streamOffset: Long = 0 + /** + * Padding = pos%(algorithm blocksize); Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put input data + * at proper position. + */ + var padding: Byte = 0 + var closed: Boolean = false + var key: Array[Byte] = null + var initIV: Array[Byte] = null + var iv: Array[Byte] = null + val oneByteBuf: Array[Byte] = new Array[Byte](1) + + CryptoStreamUtils.checkCodec(codec) + var bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + + key = keyVal.clone() + initIV = ivVal.clone() + iv = ivVal.clone() + + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * inBuffer.limit(). + */ + lazy val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSize) + } else { +ByteBuffer.allocate(bufferSize) + } + + /** + * Encrypted data buffer. The data starts at outBuffer.position() and ends at + * outBuffer.limit(); + */ + lazy val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSize) + } else { +ByteBuffer.allocate(bufferSize) + } + streamOffset = streamOffsetVal + encryptor = codec.createEncryptor() + + updateEncryptor() + + def this( --- End diff -- BTW are all these constructors really needed? If everything goes through `createCryptoOutputStream` and `createCryptoInputStream`, you should only need one constructor in each class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42051545 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = +SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix() + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY = "spark.security.java.secure.random" + + ".algorithm" + val SPARK_SECURITY_CRYPTO_JCE_PROVIDER_KEY = "spark.security.crypto.jce.provider" + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT = "SHA1PRNG" + val SPARK_SECURITY_SECURE_RANDOM_IMPL_KEY = "spark.security.secure.random.impl" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "spark.job.encryptedIntermediateData.buffer.kb" + val DEFAULT_SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "128" + val SPARK_SECURITY_SECURE_RANDOM_DEVICE_FILE_PATH_KEY = "spark.security.random.device.file.path" + val SPARK_SECURITY_SECURE_RANDOM_DEVICE_FILE_PATH_DEFAULT = "/dev/urandom" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA = "spark.job.encryptedIntermediateData" --- End diff -- This is not used anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42052097 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = +SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix() + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY = "spark.security.java.secure.random" + + ".algorithm" + val SPARK_SECURITY_CRYPTO_JCE_PROVIDER_KEY = "spark.security.crypto.jce.provider" + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT = "SHA1PRNG" + val SPARK_SECURITY_SECURE_RANDOM_IMPL_KEY = "spark.security.secure.random.impl" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "spark.job.encryptedIntermediateData.buffer.kb" --- End diff -- Why `spark.job`? It's related to encrypted shuffle, so it should be in `spark.shuffle.crypto`. For example, `spark.shuffle.crypto.bufferSize`. We also don't use units in config keys anymore; you can use `SparkConf.getSizeAsX` to translate values to the units you want, and the user can specify the values in the units they prefer. I'd also rename the constant itself. "INTERMEDIATE" is both confusing and really not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42053251 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoCodec.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.spark.{Logging, SparkConf} +import org.apache.spark.crypto.CommonConfigurationKeys.SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT +import org.apache.spark.crypto.CommonConfigurationKeys.SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY +import org.apache.spark.crypto.CommonConfigurationKeys.SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY + +/** + * Crypto codec class, encapsulates encryptor/decryptor pair. + */ +private[spark] abstract class CryptoCodec { + /** + * + * @return the CipherSuite for this codec. + */ + def getCipherSuite(): CipherSuite + + /** + * This interface is only for Counter (CTR) mode. Generally the Encryptor + * or Decryptor calculates the IV and maintain encryption context internally. + * For example a [[javax.crypto.Cipher]] will maintain its encryption + * context internally when we do encryption/decryption using the + * Cipher#update interface. + * + * The IV can be calculated by combining the initial IV and the counter with + * a lossless operation (concatenation, addition, or XOR). + * @see http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29 + * @param initIV --- End diff -- Can you explain what each argument is? for example, from the implementation, `iv` seems to be output only; why can't it be a return value? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42054231 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42059236 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -27,7 +27,7 @@ import org.apache.spark.{Logging, SparkException, TaskContext} import org.apache.spark.network.buffer.ManagedBuffer import org.apache.spark.network.shuffle.{BlockFetchingListener, ShuffleClient} import org.apache.spark.shuffle.FetchFailedException -import org.apache.spark.util.Utils +import org.apache.spark.util.{CompletionIterator, Utils} --- End diff -- This change does not seem necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42051371 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = --- End diff -- I'm a little confused about these config options. Could you write documentation for them in `configuration.md` or in this class's scaladoc? This one `SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY` particularly looks like a mouthful. If I understand correctly it would be "spark.security.crypto.codec.classes.aes.ctr.nopadding". My question is: how is this expected to be used? From what I can understand, you have two configs: one to choose the cipher suite, one to choose the codec. Why can't you have just two configs: `spark.security.crypto.cipher.suite` `spark.security.crypto.codec.classes` And that's it? Why do you need the second one to have a separate, dynamic config key depending on the chose cipher suite? Also, minor, but I'd be explicit about these options being related to the shuffle, so `spark.shuffle` instead of `spark.security`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42052623 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, --- End diff -- Constructors should be at the top of the file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42054578 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42058261 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoOutputStream.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, FilterOutputStream, OutputStream} +import java.nio.ByteBuffer +import java.security.GeneralSecurityException + +import com.google.common.base.Preconditions + +import org.apache.spark.Logging + +/** + * CryptoOutputStream encrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The encryption is buffer based. The key points of the encryption are + * (1) calculating counter and (2) padding through stream position. + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoOutputStream( +out: OutputStream, +private[spark] val codec: CryptoCodec, +bufferSizeVal: Int, +keyVal: Array[Byte], +ivVal: Array[Byte], +streamOffsetVal: Long, +isDirectBuf: Boolean) extends FilterOutputStream(out: OutputStream) with Logging { + var encryptor: Encryptor = null + var streamOffset: Long = 0 + /** + * Padding = pos%(algorithm blocksize); Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put input data + * at proper position. + */ + var padding: Byte = 0 + var closed: Boolean = false + var key: Array[Byte] = null + var initIV: Array[Byte] = null + var iv: Array[Byte] = null + val oneByteBuf: Array[Byte] = new Array[Byte](1) + + CryptoStreamUtils.checkCodec(codec) + var bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + + key = keyVal.clone() + initIV = ivVal.clone() + iv = ivVal.clone() + + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * inBuffer.limit(). + */ + lazy val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSize) + } else { +ByteBuffer.allocate(bufferSize) + } + + /** + * Encrypted data buffer. The data starts at outBuffer.position() and ends at + * outBuffer.limit(); + */ + lazy val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSize) + } else { +ByteBuffer.allocate(bufferSize) + } + streamOffset = streamOffsetVal + encryptor = codec.createEncryptor() + + updateEncryptor() + + def this( --- End diff -- Constructors go at the top of the file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42059164 --- Diff: core/src/test/scala/org/apache/spark/crypto/JceAesCtrCryptoCodecSuite.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{ByteArrayInputStream, BufferedOutputStream, ByteArrayOutputStream} +import java.security.SecureRandom + +import org.apache.spark.crypto.CommonConfigurationKeys._ +import org.apache.spark.{SparkFunSuite, SparkConf, Logging} + +/** + * test JceAesCtrCryptoCodec + */ +class JceAesCtrCryptoCodecSuite extends SparkFunSuite with Logging { + + test("TestCryptoCodecSuite") { +val random = new SecureRandom +val dataLen = 1000 +val inputData = new Array[Byte](dataLen) +val outputData = new Array[Byte](dataLen) +random.nextBytes(inputData) +// encrypt +val sparkConf = new SparkConf + sparkConf.set(SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY, + classOf[JceAesCtrCryptoCodec].getName) +val cipherSuite = new CipherSuite("AES/CTR/NoPadding", 16) +val codec = new JceAesCtrCryptoCodec(sparkConf) +val aos = new ByteArrayOutputStream +val bos = new BufferedOutputStream(aos) +val key = new Array[Byte](16) +val iv = new Array[Byte](16) +random.nextBytes(key) +random.nextBytes(iv) + +val cos = new CryptoOutputStream(bos, codec, 1024, key, iv) +cos.write(inputData, 0, inputData.length) --- End diff -- Just pinging about this issue; the test doesn't really exercise all existing code paths and it would be really nice to get more coverage given the code is not that trivial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42059206 --- Diff: core/src/test/scala/org/apache/spark/crypto/JceAesCtrCryptoCodecSuite.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{ByteArrayInputStream, BufferedOutputStream, ByteArrayOutputStream} +import java.security.SecureRandom +import java.util + +import org.scalatest.prop.TableDrivenPropertyChecks + +import org.apache.spark.{SparkFunSuite, SparkConf, Logging} +import org.apache.spark.crypto.CommonConfigurationKeys._ + +/** + * test JceAesCtrCryptoCodec + */ +private[crypto] class JceAesCtrCryptoCodecSuite extends SparkFunSuite with Logging with +TableDrivenPropertyChecks { --- End diff -- nit: indent this line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42051574 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = +SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix() + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY = "spark.security.java.secure.random" + + ".algorithm" + val SPARK_SECURITY_CRYPTO_JCE_PROVIDER_KEY = "spark.security.crypto.jce.provider" + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT = "SHA1PRNG" + val SPARK_SECURITY_SECURE_RANDOM_IMPL_KEY = "spark.security.secure.random.impl" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "spark.job.encryptedIntermediateData.buffer.kb" + val DEFAULT_SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "128" + val SPARK_SECURITY_SECURE_RANDOM_DEVICE_FILE_PATH_KEY = "spark.security.random.device.file.path" + val SPARK_SECURITY_SECURE_RANDOM_DEVICE_FILE_PATH_DEFAULT = "/dev/urandom" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA = "spark.job.encryptedIntermediateData" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS = "spark.job" + + ".encryptedIntermediateDataKeySizeBits" + val DEFAULT_SPARK_ENCRYPTED_INTERMEDIATE_DATA_KEY_SIZE_BITS = 128 + val SPARK_SHUFFLE_KEYGEN_ALGORITHM = "spark.shuffle.keygen.algorithm" --- End diff -- Should be `spark.shuffle.crypto.keygen.algorithm`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42051668 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = +SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix() + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY = "spark.security.java.secure.random" + + ".algorithm" + val SPARK_SECURITY_CRYPTO_JCE_PROVIDER_KEY = "spark.security.crypto.jce.provider" + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_DEFAULT = "SHA1PRNG" + val SPARK_SECURITY_SECURE_RANDOM_IMPL_KEY = "spark.security.secure.random.impl" + val SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "spark.job.encryptedIntermediateData.buffer.kb" + val DEFAULT_SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB = "128" + val SPARK_SECURITY_SECURE_RANDOM_DEVICE_FILE_PATH_KEY = "spark.security.random.device.file.path" --- End diff -- This is not used anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42053618 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42054099 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42054088 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42054837 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42057302 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42057285 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42057228 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala --- @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{IOException, InputStream, FilterInputStream} +import java.nio.ByteBuffer +import java.nio.channels.ReadableByteChannel +import java.security.GeneralSecurityException +import java.util.Queue +import java.util.concurrent.ConcurrentLinkedQueue + +import com.google.common.base.Preconditions + +/** + * CryptoInputStream decrypts data. It is not thread-safe. AES CTR mode is + * required in order to ensure that the plain text and cipher text have a 1:1 + * mapping. The decryption is buffer based. The key points of the decryption + * are (1) calculating the counter and (2) padding through stream position: + * + * counter = base + pos/(algorithm blocksize); + * padding = pos%(algorithm blocksize); + * + * The underlying stream offset is maintained as state. + */ +private[spark] class CryptoInputStream( +in: InputStream, +private[this] val codec: CryptoCodec, +bufferSizeVal: Integer, +keyVal: Array[Byte], +ivVal: Array[Byte], +private[this] var streamOffset: Long,// Underlying stream offset. +isDirectBuf: Boolean) +extends FilterInputStream(in: InputStream) with ReadableByteChannel { + val oneByteBuf = new Array[Byte](1) + + val bufferSize = CryptoStreamUtils.checkBufferSize(codec, bufferSizeVal) + /** + * Input data buffer. The data starts at inBuffer.position() and ends at + * to inBuffer.limit(). + */ + val inBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * The decrypted data buffer. The data starts at outBuffer.position() and + * ends at outBuffer.limit() + */ + val outBuffer = if (isDirectBuf) { +ByteBuffer.allocateDirect(bufferSizeVal) + } else { +ByteBuffer.allocate(bufferSizeVal) + } + + /** + * Whether the underlying stream supports + * [[org.apache.hadoop.fs.ByteBufferReadable]] + */ + var usingByteBufferRead = false + var usingByteBufferReadInitialized = false + /** + * Padding = pos%(algorithm blocksize) Padding is put into [[inBuffer]] + * before any other data goes in. The purpose of padding is to put the input + * data at proper position. + */ + var padding: Byte = '0' + var closed: Boolean = false + var key: Array[Byte] = keyVal.clone() + var initIV: Array[Byte] = ivVal.clone() + var iv: Array[Byte] = ivVal.clone() + var isReadableByteChannel: Boolean = in.isInstanceOf[ReadableByteChannel] + + /** DirectBuffer pool */ + var bufferPool: Queue[ByteBuffer] = new ConcurrentLinkedQueue[ByteBuffer]() + /** Decryptor pool */ + var decryptorPool: Queue[Decryptor] = new ConcurrentLinkedQueue[Decryptor]() + + lazy val tmpBuf: Array[Byte] = new Array[Byte](bufferSize) + var decryptor: Decryptor = getDecryptor() + CryptoStreamUtils.checkCodec(codec) + resetStreamOffset(streamOffset) + + def this(in: InputStream, + codec: CryptoCodec, + bufferSize: Integer, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, bufferSize, key, iv, 0, true) + } + + def this(in: InputStream, + codec: CryptoCodec, + key: Array[Byte], + iv: Array[Byte]) { +this(in, codec, CryptoStreamUtils.getBufferSize, key, iv) + } + + def getWrappedStream(): InputStream = in + + /** + * Decryption is buffer based. + * If there is data in [[outBuffer]], then read it out of this buffer. + * If there is no data in [[outBuffer]], then read more from
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42059483 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT = "AES/CTR/NoPadding" + val SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY = "spark.security.crypto.cipher.suite" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX = "spark.security.crypto.codec.classes" + val SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY = +SPARK_SECURITY_CRYPTO_CODEC_CLASSES_KEY_PREFIX + CipherSuite.AES_CTR_NOPADDING.getConfigSuffix() + val SPARK_SECURITY_JAVA_SECURE_RANDOM_ALGORITHM_KEY = "spark.security.java.secure.random" + --- End diff -- As with other settings, this is shuffle related, so `spark.shuffle.crypto`, not `spark.security`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42060005 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoStreamUtils.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import java.io.{InputStream, OutputStream} +import java.nio.ByteBuffer + +import com.google.common.base.Preconditions +import sun.misc.Cleaner +import sun.nio.ch.DirectBuffer + +import org.apache.spark.crypto.CommonConfigurationKeys._ +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.SparkConf + +/** + * A util class for CryptoInputStream and CryptoOutputStream + */ +private[spark] object CryptoStreamUtils { + val MIN_BUFFER_SIZE: Int = 512 + + /** Forcibly free the direct buffer. */ + def freeDB(buffer: ByteBuffer): Unit = { +if (buffer.isInstanceOf[DirectBuffer]) { + val bufferCleaner: Cleaner = (buffer.asInstanceOf[DirectBuffer]).cleaner + bufferCleaner.clean --- End diff -- nit: `clean()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42052749 --- Diff: core/src/main/scala/org/apache/spark/crypto/CommonConfigurationKeys.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import org.apache.hadoop.io.Text + +/** + * Constant variables + */ +private[spark] object CommonConfigurationKeys { + val SPARK_SHUFFLE_TOKEN = new Text("SPARK_SHUFFLE_TOKEN") + val SPARK_SECURITY_CRYPTO_BUFFER_SIZE_DEFAULT = 8192 --- End diff -- Another one that's a bit confusing. You return this value from a method (`CryptoStreamUtils.getBufferSize`), so that method doesn't really seem necessary. From the code it also seems related to `SPARK_ENCRYPTED_INTERMEDIATE_DATA_BUFFER_KB` but both have different default values. Looking at the way the input and output streams are created, it seems to me that these two configs are indeed the same. Could you clean this up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r42058945 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -823,6 +822,14 @@ private[spark] class Client( val securityManager = new SecurityManager(sparkConf) amContainer.setApplicationACLs( YarnSparkHadoopUtil.getApplicationAclsForYarn(securityManager).asJava) + +if (CryptoConf.isShuffleEncryted(sparkConf)) { + CryptoConf.initSparkShuffleCredentials(sparkConf, credentials) + val dob: DataOutputBuffer = new DataOutputBuffer --- End diff -- Everything after this line seems unnecessary; that's exactly what `setupSecurityToken` already does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-148217525 Hi @winningsix , I think this patch needs a lot of cleanup around the configuration part. It's really confusing at the moment, and among other issues it seems like there are duplicate, conflicting settings. The input stream implementation is also hard to follow, although I think I convinced myself it's doing the right thing. It would be nice to try to simplify it, so that others can more easily understand the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user winningsix commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-148256354 Hi @vanzin, thank you for your review and comments. I will work on the code refactory of crypto input/output stream part. Before that, I will update the patch by resolving the configuration related issues and completing the configuration.md. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147441714 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147443941 [Test build #43569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43569/consoleFull) for PR 8880 at commit [`6116b2c`](https://github.com/apache/spark/commit/6116b2c7adc7f54aa152fbd89f7a914d24aedb9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147441746 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147473474 [Test build #43569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43569/console) for PR 8880 at commit [`6116b2c`](https://github.com/apache/spark/commit/6116b2c7adc7f54aa152fbd89f7a914d24aedb9e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147473627 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147473629 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43569/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147576477 [Test build #43608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43608/consoleFull) for PR 8880 at commit [`89736c0`](https://github.com/apache/spark/commit/89736c05867c7edd7ece62cd8b674cb6a1f1a012). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147575412 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147575419 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147598348 [Test build #43608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43608/console) for PR 8880 at commit [`89736c0`](https://github.com/apache/spark/commit/89736c05867c7edd7ece62cd8b674cb6a1f1a012). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147598394 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147598395 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43608/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147292960 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147292917 [Test build #43555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43555/console) for PR 8880 at commit [`843fd72`](https://github.com/apache/spark/commit/843fd72713f56d5447a5988c33cec05ae6348549). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147292961 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43555/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147281265 [Test build #43555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43555/consoleFull) for PR 8880 at commit [`843fd72`](https://github.com/apache/spark/commit/843fd72713f56d5447a5988c33cec05ae6348549). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147280813 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147280801 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147056390 [Test build #43527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43527/consoleFull) for PR 8880 at commit [`2f31c3b`](https://github.com/apache/spark/commit/2f31c3baceaa9711dbb9d90a9797e6257e78e765). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147050557 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147068395 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147068396 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43527/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147068377 [Test build #43527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43527/console) for PR 8880 at commit [`2f31c3b`](https://github.com/apache/spark/commit/2f31c3baceaa9711dbb9d90a9797e6257e78e765). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor with Decryptor ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147069789 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43525/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147069788 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147069754 [Test build #43525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43525/console) for PR 8880 at commit [`37d6121`](https://github.com/apache/spark/commit/37d612111e0d1897e70d3acd97580d4405868807). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * ` class JceAesCtrCipher(mode: Int, provider: String) extends Encryptor with Decryptor ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147047849 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147047862 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147048107 [Test build #43525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43525/consoleFull) for PR 8880 at commit [`37d6121`](https://github.com/apache/spark/commit/37d612111e0d1897e70d3acd97580d4405868807). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8880#issuecomment-147050562 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10771: Implement the shuffle encryption ...
Github user winningsix commented on a diff in the pull request: https://github.com/apache/spark/pull/8880#discussion_r41608824 --- Diff: core/src/main/scala/org/apache/spark/crypto/CryptoCodec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.crypto + +import scala.reflect.runtime.universe + +import org.apache.spark.{Logging, SparkConf} +import org.apache.spark.crypto.CommonConfigurationKeys.SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT +import org.apache.spark.crypto.CommonConfigurationKeys.SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY +import org.apache.spark.crypto.CommonConfigurationKeys.SPARK_SECURITY_CRYPTO_CODEC_CLASSES_AES_CTR_NOPADDING_KEY + +/** + * Crypto codec class, encapsulates encryptor/decryptor pair. + */ +abstract class CryptoCodec() { + /** + * + * @return the CipherSuite for this codec. + */ + def getCipherSuite(): CipherSuite + + /** + * This interface is only for Counter (CTR) mode. Generally the Encryptor + * or Decryptor calculates the IV and maintain encryption context internally. + * For example a {@link javax.crypto.Cipher} will maintain its encryption + * context internally when we do encryption/decryption using the + * Cipher#update interface. + * + * The IV can be calculated by combining the initial IV and the counter with + * a lossless operation (concatenation, addition, or XOR). + * @see http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29 + * @param initIV + * @param counter + * @param iv initialization vector + */ + def calculateIV(initIV: Array[Byte], counter: Long, iv: Array[Byte]) + + /** + * @return Encryptor the encryptor + */ + def createEncryptor: Encryptor + + /** + * @return Decryptor the decryptor + */ + def createDecryptor: Decryptor + + /** + * Generate a number of secure, random bytes suitable for cryptographic use. + * This method needs to be thread-safe. + * @param bytes byte array to populate with random data + */ + def generateSecureRandom(bytes: Array[Byte]) +} + +object CryptoCodec extends Logging { + def getInstance(conf: SparkConf): CryptoCodec = { +val name = conf.get(SPARK_SECURITY_CRYPTO_CIPHER_SUITE_KEY, + SPARK_SECURITY_CRYPTO_CIPHER_SUITE_DEFAULT) +getInstance(conf, CipherSuite.apply(name)) + } + + def getInstance(conf: SparkConf, cipherSuite: CipherSuite): CryptoCodec = { +getCodecClasses(conf, cipherSuite).toIterator.map { + case name if name == classOf[JceAesCtrCryptoCodec].getName => new JceAesCtrCryptoCodec(conf) --- End diff -- Yes, I will add some TODOs here. As discussed in https://github.com/apache/spark/pull/5307, we will add AES-NI cipher once we decide how to do it(include Chimera into Spark or just leverage the library). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org