Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/5663#issuecomment-96734073
  
    Ok I went and dug deeper into the yarn yarn client and it is handling 
retries and things like rm rolling upgrade for you.  So you can ignore my 
comment about retries as its handled there.
    
    My comment about yarn cluster mode is because the yarn client which runs on 
the gateway (where the application was launched from) calls monitorApplication 
to report status to the user. 
     If just the gateway node loses network connection with the RM, but 
everything else in the cluster is fine, it would report to the user that the 
application has been killed when it could really be running fine.  So I think 
the main thing here would be to just report to user the state is unknown.  For 
that perhaps we can return FinalApplicationStatus.UNDEFINED instead.  
    
    So what does YARN show the status of the application as in this case (when 
run in yarn-client mode)?  I'm guessing what happens is the application master 
disconnects from the driver (onDisconnected) and then reports status as 
succeeded?
    
    So if you bring the network back up, you just continue to see these 
exceptions then?  If that is the case perhaps we should be doing something 
different with our yarn client.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to