tgravescs commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware resources scheduling in Standalone URL: https://github.com/apache/spark/pull/25047#issuecomment-509653626 thanks for the explanation. For the resources leak, that makes sense but doesn't cover all cases, which is why I was suggesting the PID directory and/or checking with master. For instance lets say the worker crashes and you only have 1 worker per node - thus is leaves the assignment file laying around. If you were tracking the assignments per worker PID, when you start a new worker on that node, if that new worker can't acquire enough resources, it could check to see if the PID that is currently assigned to the resources was still alive. Another option would be to track assignment by workerid and then when new worker starts it checks with the master to see if the old worker assigned to the resources is still there. That has the downside though of if that old worker process for some reason is still running.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
