uncleGen edited a comment on issue #26343: [SPARK-29683][YARN] Job will fail due to executor failures all available nodes are blacklisted URL: https://github.com/apache/spark/pull/26343#issuecomment-612770299 @attilapiros Adding double check `numClusterNodes != 0` can not actually resolve the issue. Like I and @sjrand have pointed, when active RM switches after failover, it will take a while for NM to register to RM, then `numClusterNodes` will be less than 1. This will trigger the `isAllNodeBlacklisted` check. It happens frequently in my local practice.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
