Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/13482
> So why don't we just take out the notifyAll call when we get a
GetExecutorLossReason?
If that helps it's ok too. It would probably increase a little bit the time
for the driver to know
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/13482
So why don't we just take out the notifyAll call when we get a
GetExecutorLossReason?
We can add a parameter to resetAllocatorInterval() and resets the interval
it but doesn't call the
Github user andrewor14 commented on the issue:
https://github.com/apache/spark/pull/13482
I think this is important to fix for 2.0 but I personally found the changes
in this patch rather confusing. If there's a simpler workaround we could do
(such as the solution I suggested, if that
Github user andrewor14 commented on the issue:
https://github.com/apache/spark/pull/13482
@rdblue the reason for the hang is the `GetExecutorLossReason` right? AFAIK
we send one to the AM every time an executor dies. What if we just keep a set
of executor IDs we're waiting to kill on
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/13482
Seems like an ok workaround to me; we really should spend some time looking
at removing some of those locks and avoiding `askWithRetry` (which shouldn't
ever be needed with a reliable RPC library
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/13482
@rdblue could you follow the usual convention in the pr title
(`[SPARK-15725][yarn] Blah`)? thx
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/13482
cc @vanzin @tgravescs
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/13482
@yhuai, @rxin, we should consider this work-around for 2.0 if it isn't too
late. We see a lot of apps fail because the driver and AM lock up.
---
If your project is set up for it, you can reply to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/13482
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/13482
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59895/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/13482
**[Test build #59895 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59895/consoleFull)**
for PR 13482 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/13482
**[Test build #59895 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59895/consoleFull)**
for PR 13482 at commit
12 matches
Mail list logo