[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-6054:
-------------------------------
    Attachment: YARN-6054.02.patch

Thanks Li Lu! I agree with you. A repair operation definitely changes the 
LevelDb files. In this patch I am creating a backup of the corrupted database. 
I am consciously neglecting to do cleanup of old backups because I don't expect 
this to occur too often. If we want automatic cleanup of old backups I propose 
we punt that to another JIRA.

> TimelineServer fails to start when some LevelDb state files are missing.
> ------------------------------------------------------------------------
>
>                 Key: YARN-6054
>                 URL: https://issues.apache.org/jira/browse/YARN-6054
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: YARN-6054.01.patch, YARN-6054.02.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: <levelDbStorePath>/timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: <levelDbStorePath>/timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to