[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812459#comment-15812459
 ] 

Naganarasimha G R commented on YARN-6054:
-----------------------------------------

Thanks for the patch [~raviprakashu], 
bq. Also, as pointed out by Jason, (e.g. in the case of NM) graceful 
degradation would be a very hard thing to achieve. More likely, the state is 
corrupt and will cause undefined behavior.
Agree, but may be we can give some kind of tool and set of steps which can be 
taken to over come it as we too faced it once.  but agree its not within this 
jira's scope !
Changes look good enough will wait for the jenkins report and if no further 
comments will commit it tomorrow !

> TimelineServer fails to start when some LevelDb state files are missing.
> ------------------------------------------------------------------------
>
>                 Key: YARN-6054
>                 URL: https://issues.apache.org/jira/browse/YARN-6054
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: YARN-6054.01.patch, YARN-6054.02.patch, 
> YARN-6054.03.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: <levelDbStorePath>/timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: <levelDbStorePath>/timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
>         at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to