[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

squito Tue, 17 Apr 2018 09:47:19 -0700

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21068
  
    > actually the only other thing I need to make sure is there aren't any 
delays if we now send the information from yarn allocator back to scheduler and 
then I assume it would need to get it back again from scheduler. During that 
the yarn allocator could be calling allocate() and updating things. So we need 
to make sure it gets the most up to date blacklist.
    
    > also I need to double check but the blacklist information isn't being 
sent to the yarn allocator when dynamic allocation is off right? We would want 
that to happen.
    
    yeah both good points.  actually, don't we want to update the general node 
blacklist on the yarn allocator even when dynamic allocation is off?  I don't 
think it gets updated at all unless dynamic allocation is on, it seems all the 
updates originate in `ExecutorAllocationManager`, the blacklist never actively 
pushes updates to the yarn allocator.  That seems like an existing shortcoming.
    
    > do you know if mesos and/or kubernetes can provide this same information?
    
    I don't know about kubernetes at all.  Mesos does provide info when a 
container fails.  I don't think it lets you know the total cluster size, but 
that should be optional.  Btw, node count is never going to be totally 
sufficient, as the remaining nodes might not actually be able to run your 
executors (smaller hardware, always taken up by higher priority applications, 
other constraints in a framework like mesos), its always going to be best 
effort.
    
    @attilapiros and I discussed this briefly yesterday, an alternative to 
moving everything into the BlacklistTracker on the driver is to just have some 
abstract base class, which is changed slightly for each cluster manager.  Then 
you could keep the flow like it is here, with the extra blacklisting living in 
YarnAllocator still.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21068: [SPARK-16630][YARN] Blacklist a node if executors won't ...

Reply via email to