agrawaldevesh commented on a change in pull request #29015:
URL: https://github.com/apache/spark/pull/29015#discussion_r454492097



##########
File path: 
core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
##########
@@ -49,6 +56,23 @@ class MasterWebUI(
       "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = 
Set("POST")))
     attachHandler(createRedirectHandler(
       "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = 
Set("POST")))
+    attachHandler(createServletHandler("/workers/kill", new HttpServlet {
+      override def doPost(req: HttpServletRequest, resp: HttpServletResponse): 
Unit = {
+        val hostPorts: Seq[String] = Option(req.getParameterValues("host"))
+          .getOrElse(Array[String]()).map(_.toLowerCase(Locale.ROOT)).toSeq
+        logInfo(s"Received request to decommission workers $hostPorts from 
${req.getRemoteAddr}")
+        if (!isDecommissioningRequestAllowed(req)) {
+          resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED)
+        } else {
+          val removedWorkers = 
masterEndpointRef.askSync[Integer](DecommissionHostPorts(hostPorts))

Review comment:
       Yeah: We would wait for each one to be processed iteratively by the 
master's message handling thread. Having said that, the decommissioning does 
not block for actually sending/acking the messages to the executors. Its merely 
updating some (potentially persistent) state in the Master so shouldn't be that 
slow. 
   
   Having said that, would this be a problem ? I am assuming that the 
JettyHandler that the MasterWebUI is built atop can indeed handle multiple 
requests in flight, where some of them are blocking.
   
   The use case for making this handler synchronous is so that the external 
agent doing the decommissioning of the hosts can know whether the cleanup 
succeeded or not. While this information is scrapeable from the MasterPage 
(that returns the status of the Workers), it would require some brittle 
scraping on the external end point. So I figured it would be better for this 
call to return the number of workers it was actually able to decommission.
   
   I am happy to switch this logic to Async if you see any red flags.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to