[jira] [Commented] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement

Prabhu Joseph (Jira) Tue, 04 Aug 2020 02:25:46 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170676#comment-17170676
 ]


Prabhu Joseph commented on YARN-10352:
--------------------------------------

Thanks [~bibinchundatt] for reviewing.

bq. The custom iterator how much improvement we have against the 
Iterators.filter ?

Have used custom iterator mainly to avoid an unnecessary Null Check required by 
FindBugs on using Iterators.filter with predicate in [^YARN-10352-002.patch] - 
[Build 
Run|https://issues.apache.org/jira/browse/YARN-10352?focusedCommentId=17161295&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17161295]
 

{code}
node must be non-null but is marked as nullable At 
MultiNodeSortingManager.java:is marked as nullable At 
MultiNodeSortingManager.java:[lines 124-125]
{code}

Was having a Predicate like below

{code}
private Predicate<N> heartbeatFilter = new Predicate<N>() {
+    @Override
+    public boolean apply(final N node) {
+      long timeElapsedFromLastHeartbeat =
+          Time.monotonicNow() - node.getLastHeartbeatMonotonicTime();
+      return timeElapsedFromLastHeartbeat <= (nmHeartbeatInterval * 2);
+    }
+  };
{code}


Let me know if this is fine, or the findbugs issue can be ignored. Will fix the 
other two comments. Thanks.


 

> Skip schedule on not heartbeated nodes in Multi Node Placement
> --------------------------------------------------------------
>
>                 Key: YARN-10352
>                 URL: https://issues.apache.org/jira/browse/YARN-10352
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 3.3.0, 3.4.0
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>              Labels: capacityscheduler, multi-node-placement
>         Attachments: YARN-10352-001.patch, YARN-10352-002.patch, 
> YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, 
> YARN-10352-006.patch
>
>
> When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM 
> Active Nodes will be still having those stopped nodes until NM Liveliness 
> Monitor Expires after configured timeout 
> (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, 
> Multi Node Placement assigns the containers on those nodes. They need to 
> exclude the nodes which has not heartbeated for configured heartbeat interval 
> (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to 
> Asynchronous Capacity Scheduler Threads. 
> (CapacityScheduler#shouldSkipNodeSchedule)
> *Repro:*
> 1. Enable Multi Node Placement 
> (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery 
> Enabled  (yarn.node.recovery.enabled)
> 2. Have only one NM running say worker0
> 3. Stop worker0 and start any other NM say worker1
> 4. Submit a sleep job. The containers will timeout as assigned to stopped NM 
> worker0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement

Reply via email to