tgravescs commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware 
resources scheduling in Standalone
URL: https://github.com/apache/spark/pull/25047#issuecomment-509653626
 
 
   thanks for the explanation. For the resources leak, that makes sense but 
doesn't cover all cases, which is why I was suggesting the PID directory and/or 
checking with master.   For instance lets say the worker crashes and you only 
have 1 worker per node - thus is leaves the assignment file laying around.  If 
you were tracking the assignments per worker PID, when you start a new worker 
on that node, if that new worker can't acquire enough resources, it could check 
to see if the PID that is currently assigned to the resources was still alive.  
 
   Another option would be to track assignment by workerid and then when new 
worker starts it checks with the master to see if the old worker assigned to 
the resources is still there. That has the downside though of if that old 
worker process for some reason is still running.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to