[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

Robert Kanter (JIRA) Fri, 09 Mar 2018 10:46:27 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393359#comment-16393359
 ]


Robert Kanter commented on YARN-7952:
-------------------------------------

Looks good overall, a few minor things:
 # We should do some sanity checking on 
{{NMLogAggregationStatusTracker#rollingInterval}} (e.g. not negative, etc)
 # There's a comment in {{NMLogAggregationStatusTracker}} that has the word 
"call" twice: "When we call call rollLogAggregationStatus..."
 # It would be good to add messages to the {{Assert}} statements in the tests
 # There are some {{assertTrue}} statements that should be {{assertEquals}}. 
For example:
{code:java}
Assert.assertTrue(reports.get(0).getLogAggregationStatus() == 
LogAggregationStatus.RUNNING);
{code}

 # In {{TestNMLogAggregationStatusTracker#testNMLogAggregationStatusUpdate}}, 
it sets the status to {{RUNNING}} and then to {{SUCCEEDED}} using 
{{System.currentTimeMillis() - 60 * 1000}} as the timestamp for both. While not 
likely, it is possible for these to end up being the same value if the code 
executes fast enough (I've actually seen this issue in YARN-7341); which will 
cause the second update to {{SUCCEEDED}} to be ignored, failing the test. 
Instead of calling {{System.currentTimeMillis()}} on each update, it would be 
best to do that once at the beginning of the test, and then reuse that "base 
time" throughout the test with different offsets - this will make the test more 
stable.
 # In {{TestNMLogAggregationStatusTracker#testLogAggregationStatusRoller}}, 
instead of the {{Thread.sleep}}, let's use {{GenericTestUtils#waitFor}}

> RM should be able to recover log aggregation status after restart/fail-over
> ---------------------------------------------------------------------------
>
>                 Key: YARN-7952
>                 URL: https://issues.apache.org/jira/browse/YARN-7952
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>            Priority: Major
>         Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch, YARN-7952.7.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

Reply via email to