agrawaldevesh opened a new pull request #29015: URL: https://github.com/apache/spark/pull/29015
### What changes were proposed in this pull request? This PR allows an external agent to inform the Master that certain nodes (or host-ports) are being decommissioned. ### Why are the changes needed? The current decommissioning is triggered by the Worker getting getting a SIGPWR (out of band possibly by some cleanup hook), which then informs the Master about it. This approach may not be feasible in some environments that cannot trigger a clean up hook on the Worker. In addition, when a large number of worker nodes are being decommissioned then the master will get a flood of messages. So we add a new post endpoint `/workers/kill` on the MasterWebUI that allows an external agent to inform the master about all the nodes being decommissioned in bulk. The workers are identified by either their `host:port` or just the host -- in which case all workers on the host would be decommissioned. This API is merely a new entry point into the existing decommissioning logic. It does not change how the decommissioning request is handled in its core. ### Does this PR introduce _any_ user-facing change? Yes, a new endpoint `/workers/kill` is added to the MasterWebUI. By default this endpoint is disabled. ### How was this patch tested? Added unit tests ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
