ankurdave edited a comment on pull request #34245:
URL: https://github.com/apache/spark/pull/34245#issuecomment-940401210
Hmm, thanks for pointing this out.
[`TaskContextImpl.markTask{Completed,Failed}`](https://github.com/apache/spark/blob/20051eb69904de6afc27fe5adb18bcc760c78701/core/src/main/scala/org/apache/spark/TaskContextImpl.scala#L121)
actually does hold the TaskContext lock while invoking the listeners. As a
result, I think the following sequence of events can produce a deadlock:
1. The main thread acquires the lock on TaskContextImpl and begins invoking
the task completion listeners.
2. The main thread interrupts the writer thread and waits for it to exit.
3. The writer thread
[handles](https://github.com/apache/spark/blob/20051eb69904de6afc27fe5adb18bcc760c78701/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L430)
the InterruptedException. The exception handler calls
`TaskContextImpl#isCompleted()`, which again attempts to acquire the lock on
TaskContextImpl, resulting in a deadlock.
We can fix this by releasing the TaskContext lock before invoking the
listeners. I'll update the PR with that change and try to write a test to repro
the deadlock.
cc @viirya @zsxwing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]