Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/5663#issuecomment-96734073
Ok I went and dug deeper into the yarn yarn client and it is handling
retries and things like rm rolling upgrade for you. So you can ignore my
comment about retries as its handled there.
My comment about yarn cluster mode is because the yarn client which runs on
the gateway (where the application was launched from) calls monitorApplication
to report status to the user.
If just the gateway node loses network connection with the RM, but
everything else in the cluster is fine, it would report to the user that the
application has been killed when it could really be running fine. So I think
the main thing here would be to just report to user the state is unknown. For
that perhaps we can return FinalApplicationStatus.UNDEFINED instead.
So what does YARN show the status of the application as in this case (when
run in yarn-client mode)? I'm guessing what happens is the application master
disconnects from the driver (onDisconnected) and then reports status as
succeeded?
So if you bring the network back up, you just continue to see these
exceptions then? If that is the case perhaps we should be doing something
different with our yarn client.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]