[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701242#comment-14701242
 ] 

Jason Lowe commented on YARN-3942:
----------------------------------

[~rajesh] the initial exception looks like an issue with the HDFS client layer, 
and most HDFS clients would have similar problems trying to use HDFS.  Normally 
HDFS operations are not retried because there are many retries already in the 
HDFS client and server layers.  So I don't think that exception is an issue to 
fix in the ATS but rather the HDFS configuration and/or code.

Also the patch does not treat that exception being logged as fatal.  It just 
logs the fact that it couldn't complete a scan for that iteration.  It will try 
again in the next scan interval.  The real problem is indicated by this line:
{noformat}
2015-08-18 01:03:35,600 [SIGTERM handler] ERROR 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
 RECEIVED SIGNAL 15: SIGTERM
{noformat}
Something outside of the ATS is killing the process with SIGTERM.

> Timeline store to read events from HDFS
> ---------------------------------------
>
>                 Key: YARN-3942
>                 URL: https://issues.apache.org/jira/browse/YARN-3942
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: timelineserver
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to