[
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975920#comment-13975920
]
Bikas Saha commented on YARN-1506:
----------------------------------
This sleep is too long for a test. Same for other sleeps in the patch.
{code}+ while (alloc1Response.getAllocatedContainers().size() < 1) {
+ LOG.info("Waiting for containers to be created for app 1...");
+ Thread.sleep(1000);{code}
In the test is will be useful to have 2 outstanding container requests for that
machine (while the machine is fully booked) so we know that the scheduler will
be trying to allocate on that machine whenever the machine heartbeats. One
outstanding request should be for 2 GB and the other for 3GB. Also, after the
node resource is changed, the test should do a few node heartbeats to make sure
that the scheduler will try to allocate new container (for the outstanding
request) on that node and fail to do so (+ not hit any NPE). Then after the
first container completes, the test should check that the 2GB outstanding
container request is satisfiied but not the 3GB request. Thereafter, complete
the second container and verify that the 3GB requests is still not satisfied
(because the NM has only 2GB resource.
Which brings another question to my mind. What happens when this change
resource command reduces the NM size to something less than the max container
size allowed. e.g. Lets say that the NM is 8GB and max allowed container size
is 4GB. So the RM accepts a 4GB request. Now the admin changes NM to 2GB. At
this point the previously accepted 4GB request cannot be satisfied and the
application will get stuck. We may need to follow this up in a different jira.
There may be some existing jiras related to max container size and actual NM
resource size.
bq. Update transition as only 2 lines of code can be shared and shared a method
across different class seems over-kill in this case
Its probably about personal stylistic choices. The number of lines of code are
less important. What is more important is that having a shared method declares
that these pieces of code are related to each other in a logical way (if such a
relation exists). The dependency may be via a method in RMNodeImpl thats called
by both transitions or one transition extending the other one. The choice
depends on how the 2 pieces of code are related to each other.
> Replace set resource change on RMNode/SchedulerNode directly with event
> notification.
> -------------------------------------------------------------------------------------
>
> Key: YARN-1506
> URL: https://issues.apache.org/jira/browse/YARN-1506
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, scheduler
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch,
> YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch,
> YARN-1506-v5.patch, YARN-1506-v6.patch, YARN-1506-v7.patch,
> YARN-1506-v8.patch, YARN-1506-v9.patch
>
>
> According to Vinod's comments on YARN-312
> (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
> we should replace RMNode.setResourceOption() with some resource change event.
--
This message was sent by Atlassian JIRA
(v6.2#6252)