Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/4491
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-85367625
[Test build #29055 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29055/consoleFull)
for PR 4491 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-85367646
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-85348209
@steveloughran: I have updated code according to your comments:[make
CryptoOutputStream.scala#close
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-85348255
[Test build #29055 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29055/consoleFull)
for PR 4491 at commit
Github user steveloughran commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-84934768
@kellyzly
i don't understand why need make CryptoOutputStream.scala#close safe. Is
there situation when
multiple threads call this function at the
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-84895270
[Test build #28986 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28986/consoleFull)
for PR 4491 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-84933397
[Test build #28986 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28986/consoleFull)
for PR 4491 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-84933426
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-84855188
@steveloughran: i don't understand why need make
CryptoOutputStream.scala#close safe. Is there situation when multiple threads
call this function at the same time?
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-84893111
@steveloughran : in hadoop, if we need add a native lib path to hadoop
execution path, we need export LD_LIBRARY_PATH
export LD_LIBRARY_PATH=x
in hadoop,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83958066
[Test build #28917 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28917/consoleFull)
for PR 4491 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83972971
[Test build #28917 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28917/consoleFull)
for PR 4491 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83972984
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user steveloughran commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83972651
Most of the sync changes in HADOOP-1170 was to deal with HBase's
expectations. Even if you don't think it matters for its current use in spark,
you should make the
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83371597
[Test build #28855 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28855/consoleFull)
for PR 4491 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83371598
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83381772
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83381297
[Test build #28858 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28858/consoleFull)
for PR 4491 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83381766
[Test build #28858 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28858/consoleFull)
for PR 4491 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83370946
[Test build #28855 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28855/consoleFull)
for PR 4491 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83379582
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83379060
[Test build #28857 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28857/consoleFull)
for PR 4491 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83379569
[Test build #28857 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28857/consoleFull)
for PR 4491 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83401413
[Test build #28861 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28861/consoleFull)
for PR 4491 at commit
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83426561
@tgravescs,@vanzin,@srowen,@JoshRosen,@CodingCat: I have updated code for
your previous comments. Big change are made in this submission:
1. Delete hadoop2.6
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83427453
[Test build #28861 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28861/consoleFull)
for PR 4491 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83427465
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user kellyzly commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26818763
--- Diff: core/src/main/scala/org/apache/spark/crypto/CipherSuite.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83903992
@steveloughran: Thanks for your valuables suggestions.
* I will update latest crypto code in hadoop latest trunk and rewrite it
to scala. I looked into
Github user kellyzly commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26818889
--- Diff:
core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala ---
@@ -0,0 +1,428 @@
+/*
+ * Licensed to the Apache Software
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26760844
--- Diff: core/src/main/scala/org/apache/spark/crypto/CipherSuite.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26760061
--- Diff:
core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala ---
@@ -0,0 +1,428 @@
+/*
+ * Licensed to the Apache Software
Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26760312
--- Diff:
core/src/main/scala/org/apache/spark/crypto/CryptoInputStream.scala ---
@@ -0,0 +1,428 @@
+/*
+ * Licensed to the Apache Software
Github user steveloughran commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-83615286
1. the `InterfaceAudience.Private` tags in Hadoop are a please don't
use` hint, although if you look at YARN AMs, they end up importing using
stuff which is
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26474163
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user kellyzly commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26463740
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26455671
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26455889
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user kellyzly commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26455600
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user kellyzly commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26455837
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26455874
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r26425372
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +133,30 @@ private[spark] class DiskBlockObjectWriter(
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77904109
Hi @kellyzly ,
Renaming the PR sounds fine. But I see that the PR still has the old code.
Are you planning on having the updated code up here soon? Otherwise, as
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77972708
@vanzin: I have renamed the jira name from Reuse hadoop encrypted shuffle
algorithm to enable spark encrypted shuffle to Add encrypted shuffle in
spark. I will update
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786363
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786367
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786385
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786348
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786280
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786424
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786413
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77786426
@srowen,@tgravescs,@vanzin: Encrypted shuffle can make the process of
shuffle more safer. I think it is necessary in spark. Previous design is
reusing hadoop encrypted
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77527956
Mind closing this PR? this change can't proceed. I don't see why this code
would be removed from Hadoop, or why a Spark version has to copy it.
---
If your project is
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-77521213
@tgravescs: My teammates and i are trying to stripping classes like
CryptoInputStream.java and CryptoOutputStream.java from package
org.apache.hadoop.crypto in hadoop.
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74228545
@kellyzly This change is a non-starter if it _requires_ Hadoop 2.6. Right
now Spark works with versions back to about 1.0.4. Is this avoidable?
---
If your project is
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74218847
@srowen : the unit test fails. The error message like
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74277671
@kellyzly There is a jira to go cleanup anywhere we are using private
interfaces. There may be a few we can't get away with but we should minimize.
In this case I
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74278181
@kellyzly I think copying the CryptoInputStream and CryptoOutputStream
classes would be more appropriate, but my question is do those class use
private interfaces? do
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74308722
HI @kellyzly ,
I want ask a questions why org.apache.hadoop.mapreduce.MRJobConfig which
marked InterfaceAudience.Private can be imported in
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74129728
I agree with @tgravescs that we should avoid using private APIs where
possible. On top of that, this patch would break the build for anything
hadoop-2.6. If using these
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74187312
[Test build #27406 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27406/consoleFull)
for PR 4491 at commit
Github user kellyzly commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24636611
--- Diff: pom.xml ---
@@ -1600,6 +1600,19 @@
/profile
profile
+ idhadoop-2.6/id
+ properties
+
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74187433
[Test build #27406 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27406/consoleFull)
for PR 4491 at commit
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24636717
--- Diff: pom.xml ---
@@ -1600,6 +1600,19 @@
/profile
profile
+ idhadoop-2.6/id
+ properties
+
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74186838
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74187436
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74207662
@tgravescs @vanzin @srowen @JoshRosen
I was glad to receive all the suggestions from you.
I summarized all of your suggestions:
* CryptoInputStream and
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74207814
@tgravescs
Did you test it with the YARN external shuffle service?
Sorry, i have not tested on this situation and will test it later.
---
If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74210025
[Test build #27418 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27418/consoleFull)
for PR 4491 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74209904
[Test build #27418 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27418/consoleFull)
for PR 4491 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74210027
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74217212
@tgravescs: I want ask a questions why
org.apache.hadoop.mapreduce.MRJobConfig which marked
InterfaceAudience.Private can be imported in
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24583118
--- Diff: pom.xml ---
@@ -1600,6 +1600,19 @@
/profile
profile
+ idhadoop-2.6/id
+ properties
+
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24583376
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -18,12 +18,22 @@
package org.apache.spark.storage
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24583172
--- Diff: pom.xml ---
@@ -1600,6 +1600,19 @@
/profile
profile
+ idhadoop-2.6/id
+ properties
+
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24583495
--- Diff: pom.xml ---
@@ -1600,6 +1600,19 @@
/profile
profile
+ idhadoop-2.6/id
--- End diff --
we would want
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24583433
--- Diff: pom.xml ---
@@ -1600,6 +1600,19 @@
/profile
profile
+ idhadoop-2.6/id
+ properties
+
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74081017
The main issue I see here is we are using private things from hadoop and
mapreduce. We've run into issues with this in the past so I think we should
stay away from
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24552210
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -39,6 +39,14 @@ import
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74002302
/cc @tgravescs (since you designed SecurityManager). Also, @sryza,
@vanzin, and @jacek-lewandowski might be interested in this.
---
If your project is set up for it,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4491#discussion_r24552192
--- Diff: pom.xml ---
@@ -1581,9 +1581,9 @@
/profile
profile
- idhadoop-2.4/id
--- End diff --
Does this imply
Github user kellyzly commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-74014097
@JoshRosen: I have updated the patch, added the profile of hadoop 2.4 I
deleted before and add a profile of hadoop2.6.
---
If your project is set up for it, you can
GitHub user kellyzly opened a pull request:
https://github.com/apache/spark/pull/4491
[SPARK-5682] Reuse hadoop encrypted shuffle algorithm to enable spark en...
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kellyzly/spark
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4491#issuecomment-73632277
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
85 matches
Mail list logo