[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802997#comment-15802997
 ] 

Ravi Prakash commented on YARN-6054:
------------------------------------

Hi Li Lu! Thanks for your review!
As you can see, I am trying to repair only once (in the catch block) when the 
service is inited. If the repair (or the subsequent open) fails and throws an 
IOException then we will again crash out and fail to start the TimelineServer. 
According to Jason's comment, and I agree, at that point we can't really do 
anything (maybe operations personnel would need to delete the entire database). 

> TimelineServer fails to start when some LevelDb state files are missing.
> ------------------------------------------------------------------------
>
>                 Key: YARN-6054
>                 URL: https://issues.apache.org/jira/browse/YARN-6054
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: YARN-6054.01.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: <levelDbStorePath>/timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: <levelDbStorePath>/timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to