[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701242#comment-14701242 ]
Jason Lowe commented on YARN-3942: ---------------------------------- [~rajesh] the initial exception looks like an issue with the HDFS client layer, and most HDFS clients would have similar problems trying to use HDFS. Normally HDFS operations are not retried because there are many retries already in the HDFS client and server layers. So I don't think that exception is an issue to fix in the ATS but rather the HDFS configuration and/or code. Also the patch does not treat that exception being logged as fatal. It just logs the fact that it couldn't complete a scan for that iteration. It will try again in the next scan interval. The real problem is indicated by this line: {noformat} 2015-08-18 01:03:35,600 [SIGTERM handler] ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM {noformat} Something outside of the ATS is killing the process with SIGTERM. > Timeline store to read events from HDFS > --------------------------------------- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver > Reporter: Jason Lowe > Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)