[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713280#comment-13713280
 ] 

Omkar Vinit Joshi commented on YARN-245:
----------------------------------------

Thanks [~mayank_bansal] for the patch.. I agree that checking heartbeat ids 
will test this issue... few comments..
{code}
+      conf.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, true);
{code}

why are we doing this?

{code}
+      NodeStatus nodeStatus = request.getNodeStatus();
+      nodeStatus.setResponseId(heartBeatID++);
{code}
required? can be removed?

* There is one issue at present with NodeStatusUpdaterImpl.java ...imagine if 
we get such a heartbeat then we will not wait but try again.. check finally 
code {} which won't get executed..... and will keep pinging RM until we get 
correct response with response-id. Should we wait or immediately request? 
thoughts?

{code}
+        Thread.sleep(1000l);
{code}
can we make it 1000? .. 

* test will need timeout. however I see there are certain tests without 
timeout... if adding timeout then add little larger value... :) 

{code}
+      if (nodeStatus.getKeepAliveApplications() != null
+          && nodeStatus.getKeepAliveApplications().size() > 0) {
+        for (ApplicationId appId : nodeStatus.getKeepAliveApplications()) {
+          List<Long> list = keepAliveRequests.get(appId);
+          if (list == null) {
+            list = new LinkedList<Long>();
+            keepAliveRequests.put(appId, list);
+          }
+          list.add(System.currentTimeMillis());
+        }
+      }
{code}
{code}
+      if (heartBeatID == 2) {
+        LOG.info("Sending FINISH_APP for application: [" + appId + "]");
+        this.context.getApplications().put(appId, mock(Application.class));
+        
nhResponse.addAllApplicationsToCleanup(Collections.singletonList(appId));
+      }
{code}

{code}
+      rt.context.getApplications().remove(rt.appId);
{code} 

{code}
+    private Map<ApplicationId, List<Long>> keepAliveRequests =
+        new HashMap<ApplicationId, List<Long>>();
+    private ApplicationId appId = BuilderUtils.newApplicationId(1, 1);
{code}

do we need this? can we remove all application related stuff? as we are now 
checking only heartbeat ids..we can remove this.. thoughts?
                
> Node Manager can not handle duplicate responses
> -----------------------------------------------
>
>                 Key: YARN-245
>                 URL: https://issues.apache.org/jira/browse/YARN-245
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>            Reporter: Devaraj K
>            Assignee: Mayank Bansal
>         Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
> YARN-245-trunk-3.patch
>
>
> {code:xml}
> 2012-11-25 12:56:11,795 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> FINISH_APPLICATION at FINISHED
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-11-25 12:56:11,796 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1353818859056_0004 transitioned from FINISHED to null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to