uncleGen edited a comment on issue #26343: [SPARK-29683][YARN] Job will fail 
due to executor failures all available nodes are blacklisted
URL: https://github.com/apache/spark/pull/26343#issuecomment-612770299
 
 
    @attilapiros Adding double check `numClusterNodes != 0` can not actually 
resolve the issue. Like I and @sjrand have pointed, when active RM switches 
after failover, it will take a while for NM to register  to RM, then 
`numClusterNodes` will be less than some small value. If it equals to like 1, 
then it will trigger the `isAllNodeBlacklisted` check. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to