[ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485330#comment-13485330
 ] 

Bikas Saha commented on YARN-72:
--------------------------------

Do we intend to stall shutdown indefinitely for this activity?
{code}
+    while (!containers.isEmpty()) {
+      try {
+        Thread.sleep(1000);
+      } catch (InterruptedException ex) {
+        LOG.warn("Interrupted while sleeping on container kill", ex);
+      }
+    }
{code}

This patch deserves some good tests to verify the new functionality.

Overall the approach seems reasonable but I will defer to someone with a better 
understanding of NM.

I was wondering if the NM could make itself part of a process group (like 
setsid) such that everything it spawns is also part of that process group. And 
the process group could be configured to terminate if the NM root process (NM) 
dies. Then the OS will take care of cleaning up the orphan processes. This 
might solve YARN-72 and YARN-73. Is something like this possible?
                
> NM should handle cleaning up containers when it shuts down ( and kill 
> containers from an earlier instance when it comes back up after an unclean 
> shutdown )
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-72
>                 URL: https://issues.apache.org/jira/browse/YARN-72
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>            Assignee: Sandy Ryza
>         Attachments: YARN-72.patch
>
>
> Ideally, the NM should wait for a limited amount of time when it gets a 
> shutdown signal for existing containers to complete and kill the containers ( 
> if we pick an aggressive approach ) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through 
> its directories for existing container.pids and try and kill an existing 
> containers matching the pids found. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to