Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/14961
[info] - using external shuffle service *** FAILED *** (1 minute)
[info] java.util.concurrent.TimeoutException: Can't find 2 executors
before 60000 milliseconds elapsed
60 seconds really is an eternity, I can't reproduce this on my local set
up, I expect we've got deadlock going on after the upgrade and would require
some proper debugging (again, if only I could reproduce it on my test systems
with access to tools like gdb/healthcenter/servicing APIs we use here). My
systems have between two and eight cores and I know this farm has a lot more
available...could be that having more cores increases the chances of thread
contention.
I had a look at other pull requests being tested and see it typically
completes in 3 seconds on a good run
using external shuffle service (3 seconds, 822 milliseconds)
at
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3258/consoleText
using external shuffle service (4 seconds, 543 milliseconds)
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3233/consoleText
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]