Github user attilapiros commented on the issue:
https://github.com/apache/spark/pull/21068
@tgravescs what about removing YARN_BLACKLIST_MAX_NODE_BLACKLIST_RATIO
config and when the set of backlisted nodes reaches numClusterNodes I stop
synchronising the backlisted nodes toward YARN so there would be still some
nodes not backlisted (the previous backlisted state so it is still different
from state of the UI but for a short time) and the failures will be counted so
finally the old mechanism using MAX_EXECUTOR_FAILURES (if configured) which
would stop the app.
This way mostRelevantSubsetOfBlacklistedNodes() and the Expiry from the
scheduler blacklisted nodes can be removed from code.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]