[ 
https://issues.apache.org/jira/browse/YARN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967748#comment-15967748
 ] 

Jason Lowe commented on YARN-2985:
----------------------------------

Based on the description of this JIRA, I think there's some confusion here.  
Aggregated logs are deleted for non-MapReduce applications as long as the 
deletion service is running, whether that deletion service is hosted by the 
MapReduce job history server or somewhere else.  That's why the proposed patch 
is so small -- it's simply reusing the same code the JHS is already running.  
The log deletion service looks at the remote log directory in HDFS.  It doesn't 
filter the list of application logs it finds there based on whether it thinks 
the app is MapReduce or not, rather it just treats them as generic 
applications.  It happens to run in the MapReduce history server, but it is 
_not_ MapReduce-specific.  If users don't want to run MapReduce applications 
but want to do log aggregtion then they just need to run the MapReduce history 
server.  They won't use it for MapReduce job history since there are no 
MapReduce jobs, but that server will perform aggregated log retention for *all* 
applications.

Therefore this JIRA is really about adding the ability to relocate the 
aggregated log deletion service from the MapReduce job history server to the 
YARN timeline server.  We don't want two of these things running in the cluster 
if someone has deployed the MapReduce history server and the YARN timeline 
server.  That could lead to error messages in the logs as one of them goes to 
traverse/delete the logs just as the other is already deleting them.  However 
we also don't want to just rip it out of the MapReduce history server and move 
it to the timeline server because the timeline server is still an optional 
server in YARN.

So we either need a way for the user to specify where they want the deletion 
service to run, whether that's the legacy location in the MapReduce history 
server (since they aren't going to run a timeline server which is still an 
optional YARN server) or in the timeline server.  Or we need to just declare 
the timeline server a mandatory server to run (at least for log aggregation 
support) and move it from one to the other.

In addition the MapReduce history server supports dynamic refresh of the log 
deletion service configs, and it would be nice not to lose that ability when it 
is hosted in the timeline server.  That could be a separate JIRA unless we're 
ripping it out of the JHS.  If it can only run in the timeline server then we 
would lose refresh functionality unless that JIRA was completed.

As for unit tests, I agree the existing tests for the deletion service cover 
the correctness of the service itself, so we just need unit tests for the 
timeline server and MapReduce JHS to verify each is starting the deletion 
service or not starting the service based on how the cluster is configured.

> YARN should support to delete the aggregated logs for Non-MapReduce 
> applications
> --------------------------------------------------------------------------------
>
>                 Key: YARN-2985
>                 URL: https://issues.apache.org/jira/browse/YARN-2985
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: log-aggregation, nodemanager
>    Affects Versions: 2.8.0
>            Reporter: Xu Yang
>            Assignee: Steven Rand
>         Attachments: YARN-2985-branch-2-001.patch
>
>
> Before Hadoop 2.6, the LogAggregationService is started in NodeManager. But 
> the AggregatedLogDeletionService is started in mapreduce`s JobHistoryServer. 
> Therefore, the Non-MapReduce application can aggregate their logs to HDFS, 
> but can not delete those logs. Need the NodeManager take over the function of 
> aggregated log deletion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to