zhihai xu commented on YARN-3446:

Hi [~kasha], Thanks for the review! I attached a new patch YARN-3446.003.patch, 
which addressed your first comment. I also added more test cases to verify 
{{getHeadroom}} with blacklisted nodes remove and addition.
About your second comment: IMHO, if we didn't do the optimization, that will be 
a very big overhead for a large cluster. For example, we have 2000 AM running 
on 5000 nodes cluster, For each AM, we need go through 5000 nodes list to find 
the blacklisted {{SchedulerNode}} in the heartbeat. With 2000 AM, it will loop 
10,000,000 times. Normally number of blacklisted nodes should be very small for 
each application. So iterating on the blacklisted nodes may not be a 
performance issue. Also AM won't change blacklisted nodes frequently.
About your third comment, it is because currently {{SchedulerNode}} are stored 
in {{AbstractYarnScheduler#nodes}} with key {{NodeId}}. But 
{{AppSchedulingInfo}} stores the blacklisted nodes using {{String}} Node Name 
or Rack Name. I can't find an easy way to translate Node Name and Rack Name to 
{{NodeId}}. So it looks like we need iterate through 
{{AbstractYarnScheduler#nodes}} to find the blacklisted {{SchedulerNode}} if we 
use {{AppSchedulingInfo#getBlacklist}}. That means for a 5000 nodes cluster, we 
need loop 5000 times, a big overhead. {{AbstractYarnScheduler#nodes}} are 
defined at the following code:
  protected Map<NodeId, N> nodes = new ConcurrentHashMap<NodeId, N>();

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -------------------------------------------------------------------------
>                 Key: YARN-3446
>                 URL: https://issues.apache.org/jira/browse/YARN-3446
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.

This message was sent by Atlassian JIRA

Reply via email to