[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975920#comment-13975920
 ] 

Bikas Saha commented on YARN-1506:
----------------------------------

This sleep is too long for a test. Same for other sleeps in the patch.
{code}+    while (alloc1Response.getAllocatedContainers().size() < 1) {
+      LOG.info("Waiting for containers to be created for app 1...");
+      Thread.sleep(1000);{code}

In the test is will be useful to have 2 outstanding container requests for that 
machine (while the machine is fully booked) so we know that the scheduler will 
be trying to allocate on that machine whenever the machine heartbeats. One 
outstanding request should be for 2 GB and the other for 3GB. Also, after the 
node resource is changed, the test should do a few node heartbeats to make sure 
that the scheduler will try to allocate new container (for the outstanding 
request) on that node and fail to do so (+ not hit any NPE). Then after the 
first container completes, the test should check that the 2GB outstanding 
container request is satisfiied but not the 3GB request. Thereafter, complete 
the second container and verify that the 3GB requests is still not satisfied 
(because the NM has only 2GB resource.

Which brings another question to my mind. What happens when this change 
resource command reduces the NM size to something less than the max container 
size allowed. e.g. Lets say that the NM is 8GB and max allowed container size 
is 4GB. So the RM accepts a 4GB request. Now the admin changes NM to 2GB. At 
this point the previously accepted 4GB request cannot be satisfied and the 
application will get stuck. We may need to follow this up in a different jira. 
There may be some existing jiras related to max container size and actual NM 
resource size.

bq. Update transition as only 2 lines of code can be shared and shared a method 
across different class seems over-kill in this case
Its probably about personal stylistic choices. The number of lines of code are 
less important. What is more important is that having a shared method declares 
that these pieces of code are related to each other in a logical way (if such a 
relation exists). The dependency may be via a method in RMNodeImpl thats called 
by both transitions or one transition extending the other one. The choice 
depends on how the 2 pieces of code are related to each other.

> Replace set resource change on RMNode/SchedulerNode directly with event 
> notification.
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1506
>                 URL: https://issues.apache.org/jira/browse/YARN-1506
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, scheduler
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
> YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, 
> YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch, 
> YARN-1506-v8.patch, YARN-1506-v9.patch
>
>
> According to Vinod's comments on YARN-312 
> (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
>  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to