[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778903#comment-16778903 ] Rakesh Shah commented on YARN-6054: --- Hi [~raviprak], As we can see the backup is taken while the first time when exception occurs, so there is a chance that there may be loss of data. As it is is trying to repair the level db directory with the backup one. And the backup one may not have all the files in backup ?. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2, 2.8.2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119164#comment-16119164 ] Ravi Prakash commented on YARN-6054: Sure Junping! I took the liberty of cherry-picking the change into branch-2.8 and branch-2.8.2 after running the unit test. Internally at my company we have backported this already and were running without problems because of this issue with Hadoop-2.7.3. Thanks for the suggestion of merging into 2.8.2 > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2, 2.8.2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119128#comment-16119128 ] Junping Du commented on YARN-6054: -- We actually hit this problem recently. Bump up to Critical as the failure will hang entire ATS server. Hi [~raviprak] and [~gtCarrera9], shall we consider to backport this fix to 2.8.2? > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815912#comment-15815912 ] Li Lu commented on YARN-6054: - Thanks [~raviprak]. The committed patch LGTM. Once the old file is backed up we don't need to worry if the repair process would make things worse. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815822#comment-15815822 ] Ravi Prakash commented on YARN-6054: Thanks Naganarasimha! Thanks for looking Li Lu! Please feel free to comment if you find anything and we'll get it in. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815695#comment-15815695 ] Naganarasimha G R commented on YARN-6054: - Ohh sorry, wasn't sure you were looking into it. May be next time will try to give lil more time. YARN-5934 was tracking the failed test case also missed to mention the same though i observed it ! > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815599#comment-15815599 ] Li Lu commented on YARN-6054: - Oops sorry [~Naganarasimha] I was trying to take a closer look at the updated patch, but never mind... Also, is the UT failure traced somewhere else? > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814609#comment-15814609 ] Hudson commented on YARN-6054: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11099 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11099/]) YARN-6054. TimelineServer fails to start when some LevelDb state files (naganarasimha_gr: rev 4c431a694059e40e78365b02a1497a6c7e479a70) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814607#comment-15814607 ] Naganarasimha G R commented on YARN-6054: - thanks for the contribution [~raviprakashu] and additional reviews from [~gtCarrera9]. Committed to trunk and branch-2. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812498#comment-15812498 ] Hadoop QA commented on YARN-6054: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 43s{color} | {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6054 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846380/YARN-6054.03.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux a7d68c595185 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 287d3d6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14609/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14609/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14609/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TimelineServer fails to start when some LevelDb state files are missing. > > >
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812459#comment-15812459 ] Naganarasimha G R commented on YARN-6054: - Thanks for the patch [~raviprakashu], bq. Also, as pointed out by Jason, (e.g. in the case of NM) graceful degradation would be a very hard thing to achieve. More likely, the state is corrupt and will cause undefined behavior. Agree, but may be we can give some kind of tool and set of steps which can be taken to over come it as we too faced it once. but agree its not within this jira's scope ! Changes look good enough will wait for the jenkins report and if no further comments will commit it tomorrow ! > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch, > YARN-6054.03.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812432#comment-15812432 ] Ravi Prakash commented on YARN-6054: Thanks Naganarasimha for your careful review! As I posted in the first comment, the repair did indeed fix the issue for us (we had a production incident.) As I'm sure you'll understand, we can't post the leveldb files in the open source. # I feel this JIRA is very specific to the TimelineServer so I am hesitant to include other daemons. Also, as pointed out by Jason, (e.g. in the case of NM) graceful degradation would be a very hard thing to achieve. More likely, the state is corrupt and will cause undefined behavior. # Fair point. Will do. # Great idea. Will do. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15808769#comment-15808769 ] Naganarasimha G R commented on YARN-6054: - Thanks [~raviprakashu] for the patch, overall patch looks fine technically, but has it been tested in in the actual scenario ? Assuming that you had encountered this and tried this option, i am asking it. Also in the test we are just ensuring that the api is just called, so if it has been tried and useful at least once then ok. Some points : # Additionally we are using LevelDb in multiple other places like NM state store etc, would it be good to handle in these places too as part of this jira itself ? # we are trying to backup the files hope test case could verify that scenario too. # {{setTestFactory}} can be annotated with VisibleForTesting and the name can be just {{setFactory}} > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804747#comment-15804747 ] Ravi Prakash commented on YARN-6054: The test failure is not related and seems to already have been reported in YARN-5934 > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804726#comment-15804726 ] Hadoop QA commented on YARN-6054: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 51s{color} | {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6054 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12845967/YARN-6054.02.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 53c645086550 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4a659ff | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/14587/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/14587/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/14587/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TimelineServer fails to start when some LevelDb state files are missing. > > >
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803022#comment-15803022 ] Li Lu commented on YARN-6054: - Thanks [~raviprak], fail the second attempt sounds like a right choice. I'm not very familiar with the repair method for leveldb jni, but would just like to verify that even though a repair fails, the data corruption will not be in a worsened form. We would like to avoid the case where the data was recoverable by some approaches (other than repair) but becomes not recoverable after a repair. Is this possible? Thanks! > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802997#comment-15802997 ] Ravi Prakash commented on YARN-6054: Hi Li Lu! Thanks for your review! As you can see, I am trying to repair only once (in the catch block) when the service is inited. If the repair (or the subsequent open) fails and throws an IOException then we will again crash out and fail to start the TimelineServer. According to Jason's comment, and I agree, at that point we can't really do anything (maybe operations personnel would need to delete the entire database). > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802926#comment-15802926 ] Li Lu commented on YARN-6054: - Thanks [~raviprak] for the patch! One quick concern is what will happen if the repair fails. IIUC we're repairing every time there are IOEs, will this cause any false alarms and/or accidentally make things worse? Thanks! > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799324#comment-15799324 ] Ravi Prakash commented on YARN-6054: Thanks to Jason's pointer for repairing the LevelDb [here|https://issues.apache.org/jira/browse/YARN-2873?focusedCommentId=14216259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14216259], when we tried to "repair" the levelDb, the TS came up just fine. > TimelineServer fails to start when some LevelDb state files are missing. > > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Ravi Prakash > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: /timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > /timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org