Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19951
This seems like a race during shutdown:
- executor disconnects, disconnects, which causes "on disconnect" event to
be queued
- at the same time, the `stop()` thread ends up calling `Dispatcher.stop()`
which unregisters all endpoints and enqueues a message that stops each endpoint
receiver
- driver endpoint inbox is drained; "on disconnect" callback is called,
driver tries to send a message to itself, but because it has been unregistered
above, it fails.
You could argue that what the RpcEnv is doing above is sort of fishy
(delivering messages to the endpoint after it's already been unregistered), but
this looks like an ok workaround.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]