GitHub user ilganeli opened a pull request:
https://github.com/apache/spark/pull/5277
[SPARK-6492][CORE] SparkContext.stop() can deadlock when
DAGSchedulerEventProcessLoop dies
I've added a timeout and retry loop around the SparkContext shutdown code
that should fix this deadlock. If a SparkContext shutdown is in progress when
another thread comes knocking, it will wait for 10 seconds for the lock, then
fall through where the outer loop will re-submit the request.
Also, I've moved the {code} stopped = true {/code} to the end of the
shutdown sequence since otherwise the sequence may not complete successfully
yet the context would be labelled as stopped.
I would appreciate feedback from folks more familiar with the shutdown code
to confirm that it's ok to run some of the items within the shutdown sequence
multiple times in case the entire sequence doesn't complete successfully the
first time. I think this should be fixed rather than leaving it as it is now
since in my opinion a double cleanup with an error is a better solution than an
improper cleanup.
I added a null-check for the dagScheduler object which could otherwise
trigger a NullPointerException.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ilganeli/spark SPARK-6492
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5277.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5277
----
commit 343cb941d4ffde4c8a1552ddf693d23a27d8388f
Author: Ilya Ganelin <[email protected]>
Date: 2015-03-30T22:45:22Z
[SPARK-6492] Added timeout/retry logic to fix a deadlock in SparkContext
shutdown
commit df8224ff4621dc6974d56292af09281c31b4381e
Author: Ilya Ganelin <[email protected]>
Date: 2015-03-30T22:46:14Z
Added comment for added lock
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]