Github user markgrover commented on the pull request:
https://github.com/apache/spark/pull/8093#issuecomment-130167519
> guess we could do that. My concern is that the race is probably always
going to be won by the executor disconnect message (instead of the explicit
RemoveExecutor message), which means that most of the time these messages will
still not show up in the driver UI...
@vanzin regarding the above, I got some data to help us out. I ran a job in
yarn client mode that allocated a lot of ByteBuffers and had a 1000 tasks. 72
of these tasks failed, 70 of these were won by the onDisconnected event and
hence displayed a generic message, the other 2 were won by the RemoveExecutor
event and showed the yarn killing container error in the UI. So, you are right.
However, I still think that even showing the generic message in the UI
`Remote Rpc client disassociated. Likely due to containers exceeding
thresholds, or network issues. Check driver logs for WARNings` is better than
the status quo. So opening a separate JIRA for the race condition and exploring
the best way to proceed there, makes the most sense to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]