Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5277#issuecomment-87938070
> Do you mean calling EventLoop.stop in dagScheduler.onError?
I was referring to the race that I reported in
[SPARK-6492](https://issues.apache.org/jira/browse/SPARK-6492) (referred to by
this PR), where the EventLoop.onError() calls SparkContext.stop(), which blocks
on the SPARK_CONTEXT_CONSTRUCTOR_LOCK lock, while another thread has
simultaneously called SparkContext.stop(), acquired the
SPARK_CONTEXT_CONSTRUCTOR_LOCK, and called `DAGScheduler.stop()`, which calls
`EventLoop.stop()`. My comment was referring to this `EventLoop.stop()` call
blocking indefinitely while waiting to `join()` on the event processing thread,
which is blocked on acquiring the `SPARK_CONTEXT_CONSTRUCTOR_LOCK`.
Based on the comments upthread from @ilganeli, I don't think that we should
adopt my earlier suggestion of having `EventLoop.stop()` being a no-op as long
as some other thread is in the process of stopping the EventLoop, since this
could result in a scenario where in-progress cleanup is still happening after a
user's call to SparkContext.stop() has returned, which could lead to cleanup
being skipped were the JVM to exit at that point.
I am worried about the case where calling `EventLoop.stop()` from the event
loop thread itself (which can happen transitively here) leads to a one-threaded
deadlock, but I guess your other patch addresses this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]