Ravi Prakash created YARN-6054:
----------------------------------

             Summary: TimelineServer fails to start when some LevelDb state 
files are missing.
                 Key: YARN-6054
                 URL: https://issues.apache.org/jira/browse/YARN-6054
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.0.0-alpha2
            Reporter: Ravi Prakash


We encountered an issue recently where the TimelineServer failed to start 
because some state files went missing.

{code}
2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
 failed in state INITED
; cause: org.apache.hadoop.service.ServiceStateException: 
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing 
files; e.g.: <levelDbStorePath>/timelines
erver/leveldb-timeline-store.ldb/127897.sst
org.apache.hadoop.service.ServiceStateException: 
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing 
files; e.g.: <levelDbStorePath>/timelineserver/lev
eldb-timeline-store.ldb/127897.sst

2016-11-21 20:46:43,135 FATAL 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
 Error starting ApplicationHistoryServer
org.apache.hadoop.service.ServiceStateException: 
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing 
files; e.g.: 
<levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
        at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
        at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 
9 missing files; e.g.: 
<levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
        at 
org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
        at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
        at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
        at 
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 5 more
2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status -1
{code}
Ideally we shouldn't have any missing state files. However I'd posit that the 
TimelineServer should have graceful degradation instead of failing to start at 
all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to