[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

Jason Lowe (JIRA) Mon, 11 Apr 2016 15:37:42 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236117#comment-15236117
 ]


Jason Lowe commented on YARN-4924:
----------------------------------

org.iq80.levedb.DBException (the one we're interested in catching) is a 
RuntimeException, and therefore doesn't have to be declared.  Having JniDB 
throw an IOException-derived database exception only to have it translated into 
a Runtime-derived exception by the org.iq80 DB wrapper isn't the most ideal API 
for our use case since we actually try to handle the I/O errors without them 
being fatal.

I don't _think_ createWriteBatch will throw that particular exception in 
practice, so we're probably OK.  However it's safer to put it in the try block 
given we're at the mercy of the implementation to not throw that exception in 
the future.

> NM recovery race can lead to container not cleaned up
> -----------------------------------------------------
>
>                 Key: YARN-4924
>                 URL: https://issues.apache.org/jira/browse/YARN-4924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: sandflee
>         Attachments: YARN-4924.01.patch, YARN-4924.02.patch, 
> YARN-4924.03.patch, YARN-4924.04.patch
>
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

Reply via email to