[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

Robert Kanter (JIRA) Mon, 06 Aug 2018 15:04:29 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570843#comment-16570843
 ]


Robert Kanter commented on YARN-4946:
-------------------------------------

{quote}Please also keep in mind that checkAppNumCompletedLimit is only invoked 
when an app becomes completed (APP_COMPLETED event dispatched from 
FinalTransition), so it can happen that we store more applications in the 
state-store and memory until another app finishes, but I can't think of any 
better solution currently.{quote}
I think that's fine.  The pressure to remove older completed apps comes from 
newer apps being completed, so if there are no new apps completed, we don't 
need to worry about this.  And when a newer app completes, this will trigger.

The 003 patch looks good overall.  Here's some more comments:
# Missing a newline at the end of {{RMAppImplTest}}.
# {{RMAppImplTest}} should be named {{TestRMAppImpl}} (we typically name tests 
{{TestFoo}} rather than {{FooTest}}).
# In {{RMAppImplTest}}, it would be best if we could avoid so much reflection 
to get access to internal state.  Instead, perhaps we can either get the App 
into that state by sending events (it looks like {{TestRMAppTransitions}} does 
this) or a {{\@VisibleForTesting}} method.  
#- Similarly, with {{completedAppsField}} in 
{{TestRMAppManager$TestRMAppManager}}.  It would be better to add a getter 
that's {{\@VisibleForTesting}} and package private.  Or given that 
{{TestRMAppManager$TestRMAppManager}} is a subclass of {{RMAppManager}}, you 
could make it protected.
# I'm not sure we need so many tests in {{RMAppImplTest}}.  The updated code in 
{{RMAppImpl#FinalTransition}} doesn't really do much with log aggregation now, 
and does pretty much the same thing whether it's enabled or not.  So having 
tests that check the behavior with various states of log aggregation seems 
unnecessary.  
#- In fact, the changes in {{RMAppImpl#FinalTransition}} are all just 
re-organizing the code - the logic is still the same, right?  And it's not 
really even related to this JIRA.  Perhaps these changes should be moved to a 
new JIRA that's aim is to make the code more readable?
# {{RMAppManager}} changes and tests look good.  
# The line declaring 
{{TestAppManager#createRMAppsMapMixedLogAggregationStatus}} is too long.


> RM should not consider an application as COMPLETED when log aggregation is 
> not in a terminal state
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4946
>                 URL: https://issues.apache.org/jira/browse/YARN-4946
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: log-aggregation
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-4946.001.patch, YARN-4946.002.patch, 
> YARN-4946.003.patch
>
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
> Yarn App into a HAR file.  When run, it seeds the list by looking at the 
> aggregated logs directory, and then filters out ineligible apps.  One of the 
> criteria involves checking with the RM that an Application's log aggregation 
> status is not still running and has not failed.  When the RM "forgets" about 
> an older completed Application (e.g. RM failover, enough time has passed, 
> etc), the tool won't find the Application in the RM and will just assume that 
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed 
> from its history) until the aggregation status has reached a terminal state 
> (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

Reply via email to