Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
I'm preparing a PR for 2.3, thanks for reminding!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
we should pull this back into spark 2.3 at least, I don't think this is a
clean cherry pick due to barrier scheduling stuff, would you be willing to put
up PR?
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22112
Thanks! Merged to master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
testing so far looks good. I'm +1 for this.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22112
The current solution looks good to me for unblocking the Apache 2.4
release. We definitely should continue improving the fix, as what the other
reviewers suggested above.
---
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
@tgravescs thanks for testing it out! I've created
https://issues.apache.org/jira/browse/SPARK-25341 and
https://issues.apache.org/jira/browse/SPARK-25342 to track the followup.
I think
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95713/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95713 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95713/testReport)**
for PR 22112 at commit
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
yeah we should file a separate jira to look at the shuffle output.I'm
running a few stress tests and will let you know how those go.
could you file a jira for that and link to this
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95713 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95713/testReport)**
for PR 22112 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95701/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95701 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95701/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95701 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95701/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95697 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95697/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95697/
Test FAILed.
---
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
Any more comments? cc @tgravescs @mridulm @markhamstra
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95697 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95697/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
@tgravescs yes you are right about the problem here. Instead of asking
executors to remove old committed shuffle data, I prefer #6648 , which just
write new shuffle data with a different file
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
yeah you would have to be able to handle network partitioning somehow. I
don't know how difficult it is but its definitely work we may not want to do
here. I was trying to clarify and make
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22112
> So in order to fix that we would need a way to tell the executors to
remove that older committed shuffle data
@tgravescs It is also hard to implement such a robust solution for
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
ok for anyone else trying, I was able to reproduce this consistently with
the following code, adding in more repartitions. I have blacklisting, dynamic
allocation, and external shuffle service
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
also thanks for adding the test cases, did you have to run that many times
to reproduce?
One thing to note for others is you have to have external shuffle off. I
haven't been able to
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
To clarify your last few comments, I think you are saying if you were to
fail all the reduce tasks, the shuffle write data is still there and doesn't
get removed and since first write wins on
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95607/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95607 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95607/testReport)**
for PR 22112 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95607 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95607/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95597/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95597 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95597/testReport)**
for PR 22112 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
Update, according to the discussion in
https://github.com/apache/spark/pull/9214 , the current behavior of shuffle
writing is: "first write wins". We can't simply change it to "last write wins",
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95597 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95597/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95577/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95577 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95577/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95577 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95577/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95574/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95574 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95574/testReport)**
for PR 22112 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
BTW checkpoint also works
![image](https://user-images.githubusercontent.com/3182036/44943367-d365ed80-adf7-11e8-98e9-574c13d1fb05.png)
---
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
Sorry to be late, as this bug is really hard to reproduce. We need fetch
failure to happen after an indeterminate map stage, we also need a large
cluster, so that a fetch failure doesn't lose all
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95574 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95574/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
yeah that doesn't reproduce it, you really need a fetch failure in there
and would think some kind of randomness or order in the map output, I've
started to try to write something to reproduce
Github user mccheah commented on the issue:
https://github.com/apache/spark/pull/22112
@cloud-fan @tgravescs was wondering if we could get an ETA on this landing?
Also, I tried running something analogous to the example script from the
description of
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
looking.So what all have you done for testing on this? Any manual
testing with the checkpoints, etc?
I'll try to run some today.
---
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
@tgravescs @mridulm @squito @markhamstra Any more comemnts? This blocks 2.4
and I'm going to merge it in the next one or two days, if none of you objects.
Thanks!
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22112
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95426/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95426 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95426/testReport)**
for PR 22112 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #4302 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4302/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95421/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95421 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95421/testReport)**
for PR 22112 at commit
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/22112
ping @tgravescs @mridulm @squito @markhamstra
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95420/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95420 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95420/testReport)**
for PR 22112 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95426 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95426/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95419/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95419 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95419/testReport)**
for PR 22112 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #4302 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4302/testReport)**
for PR 22112 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95421 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95421/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95420 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95420/testReport)**
for PR 22112 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #95419 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95419/testReport)**
for PR 22112 at commit
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/22112
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #4300 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4300/testReport)**
for PR 22112 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #4301 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4301/testReport)**
for PR 22112 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22112
**[Test build #4300 has
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4300/testReport)**
for PR 22112 at commit
1 - 100 of 331 matches
Mail list logo