tgravescs commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-619031231


   it might have been discussions no on PR or buried in comments, I don't have 
time to go looking.  there are many different conditions to consider. The main 
one we were focusing on is that you had as many executors as you needed to 
execute the task you had left.  This means the allocation manager was not going 
to ask for more. The problem is that some or all of those executor can get 
blacklisted.  The only way for the dynamic allocation manager to know it needs 
to ask for more is for it to know that nodes are blacklisted and it needs to 
ask for some more executors - thus internally incrementing is count of 
executors needed and asking yarn or other resource manager for more.  so now 
the number of executors the allocation manager is different then what its 
normal calculations would figure out.  simplified: #executors = (#tasks * #cpus 
per task/#cores per executor).   So you have to change the number of executors 
and you have to keep taking that it account because the allocation manager is 
always trying to calculate if it needs more or less executors.  You also have 
to notify it when executors become unblacklisted, or perhaps the ones that were 
blacklisted idle timeout, etc.  The allocation manager has to know a lot more 
details about the blacklisting and take that into account when its calculating 
the number of executors it needs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to