tgravescs commented on a change in pull request #24374: [SPARK-27366][CORE] 
Support GPU Resources in Spark job scheduling
URL: https://github.com/apache/spark/pull/24374#discussion_r289179092
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ##########
 @@ -263,7 +272,7 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
         val workOffers = activeExecutors.map {
           case (id, executorData) =>
             new WorkerOffer(id, executorData.executorHost, 
executorData.freeCores,
-              Some(executorData.executorAddress.hostPort))
+              Some(executorData.executorAddress.hostPort), 
executorData.availableResources.toMap)
 
 Review comment:
   Oh sorry you are right, I misread, and that is actually worse I think.  I 
don't think we want to allow the taskScheduler to be modifying a data structure 
owned by the CoarseGrainedSchedulerBackend.  At this point free cores is 
handled separately. We create the Worker offers that has its own book keeping 
separate from the executorDataMap in CoarseGrainedSchedulerBacked. The task 
scheduler can do whatever it wants with the WorkerOffers and then the 
CoarseGrainedSchedulerBackend is responsible for updating the  executorDataMap. 
   
    If they are kept totally separate you don't have to worry about the 
executorDataMap being wrong if serializedTask is to large or if some other 
thing happens before the launchTasks.  If you keep them together you have to be 
very sure any thing that fails after reservation unreserves them, which I think 
is more of a maintenance issue.   If you miscount in one iteration of the task 
scheduler loop dealing with WorkerOffers it is kind of ok in the sense the next 
makeOffers call will fix it up and have the right data since it will be pulled 
from the separate executorDataMap.
   
   So I think we should make a deep copy of that to send into the WorkerOffer.  
normally I wouldn't expect much to be in there so it shouldn't be to expensive 
to do that.  But it does get called a lot if you have a lot of executors so it 
might be interesting to see what the overhead of that is when you have 
thousands of executors.  I'll try to look at it some more tomorrow.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to