tgravescs commented on issue #25047: [WIP][SPARK-27371][CORE] Support GPU-aware 
resources scheduling in Standalone
URL: https://github.com/apache/spark/pull/25047#issuecomment-509276365
 
 
   I have a few general questions, note I haven't look at all of the code yet.
   
    I'm not an expert in standalone mode but it supports both a client mode and 
a cluster mode.  In your description are you saying even the client mode will 
use the resource file and lock it?  How do you know the client is running on a 
node with GPU's or a worker?  I guess as long as location is the same it 
doesn't matter.  This is one thing in YARN we never have handled, in client 
mode the user is on their own for resource coordination. 
   
   It seems unreliable to assume you have multiple workers per node (for the 
case a worker crashes). When the worker dies it automatically kills any 
executors, correct? is there a chance it doesn't?   
   It feels like you really want something like Worker restart and recovery, 
meaning if a worker crashes and you restart it, it should have same id and 
discover what it had reserved before and possibly any executors still running.  
But that is probably a much bigger change. standalone uses a PID dir to tell 
what workers are running, correct? it could use this to  track and check 
allocated resources.  If you track the pid with the assignments if a new 
worker/driver starts there could check if there aren't enough resources for it 
to allocate, or based on what Master says, or both really, when new Worker 
comes up ask Master who is supposed to be there and then checks process 
existence.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to