[jira] [Comment Edited] (YARN-11020) [UI2] No container is found for an application attempt with a single AM container
[ https://issues.apache.org/jira/browse/YARN-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454442#comment-17454442 ] Szilard Nemeth edited comment on YARN-11020 at 12/7/21, 7:53 AM: - Thanks [~adam.antal] for chiming in. I agree with you that I'd be good to fix this on the API side but unfortunately it would break backward compatibility so I think we need to live with this. Just committed the fix from [~gandras] was (Author: snemeth): Thanks [~adam.antal] for chiming in. I agree with you that I'd be good to fix this on the API side but unfortunately it would break backward compatibility so I think we need to live with this. > [UI2] No container is found for an application attempt with a single AM > container > - > > Key: YARN-11020 > URL: https://issues.apache.org/jira/browse/YARN-11020 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In UI2 for an application under the Logs tab, No container data available > message is shown if the application attempt only submitted a single container > (which is the AM container). > The culprit of the issue is that the response from YARN is not consistent, > because for a single container it looks like: > {noformat} > { > "containerLogsInfo": { > "containerLogInfo": [ > { > "fileName": "prelaunch.out", > "fileSize": "100", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "directory.info", > "fileSize": "2296", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "stderr", > "fileSize": "1722", > "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021" > }, > { > "fileName": "prelaunch.err", > "fileSize": "0", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "stdout", > "fileSize": "0", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "syslog", > "fileSize": "38551", > "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021" > }, > { > "fileName": "launch_container.sh", > "fileSize": "5013", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > } > ], > "logAggregationType": "AGGREGATED", > "containerId": "container_1638174027957_0008_01_01", > "nodeId": "da175178c179:43977" > } > }{noformat} > As for applications with multiple containers it looks like: > {noformat} > { > "containerLogsInfo": [{ > > }, { }] > }{noformat} > We can not change the response of the endpoint due to backward compatibility, > therefore we need to make UI2 be able to handle both scenarios. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11020) [UI2] No container is found for an application attempt with a single AM container
[ https://issues.apache.org/jira/browse/YARN-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454442#comment-17454442 ] Szilard Nemeth commented on YARN-11020: --- Thanks [~adam.antal] for chiming in. I agree with you that I'd be good to fix this on the API side but unfortunately it would break backward compatibility so I think we need to live with this. > [UI2] No container is found for an application attempt with a single AM > container > - > > Key: YARN-11020 > URL: https://issues.apache.org/jira/browse/YARN-11020 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In UI2 for an application under the Logs tab, No container data available > message is shown if the application attempt only submitted a single container > (which is the AM container). > The culprit of the issue is that the response from YARN is not consistent, > because for a single container it looks like: > {noformat} > { > "containerLogsInfo": { > "containerLogInfo": [ > { > "fileName": "prelaunch.out", > "fileSize": "100", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "directory.info", > "fileSize": "2296", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "stderr", > "fileSize": "1722", > "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021" > }, > { > "fileName": "prelaunch.err", > "fileSize": "0", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "stdout", > "fileSize": "0", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "syslog", > "fileSize": "38551", > "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021" > }, > { > "fileName": "launch_container.sh", > "fileSize": "5013", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > } > ], > "logAggregationType": "AGGREGATED", > "containerId": "container_1638174027957_0008_01_01", > "nodeId": "da175178c179:43977" > } > }{noformat} > As for applications with multiple containers it looks like: > {noformat} > { > "containerLogsInfo": [{ > > }, { }] > }{noformat} > We can not change the response of the endpoint due to backward compatibility, > therefore we need to make UI2 be able to handle both scenarios. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11020) [UI2] No container is found for an application attempt with a single AM container
[ https://issues.apache.org/jira/browse/YARN-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11020. --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > [UI2] No container is found for an application attempt with a single AM > container > - > > Key: YARN-11020 > URL: https://issues.apache.org/jira/browse/YARN-11020 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In UI2 for an application under the Logs tab, No container data available > message is shown if the application attempt only submitted a single container > (which is the AM container). > The culprit of the issue is that the response from YARN is not consistent, > because for a single container it looks like: > {noformat} > { > "containerLogsInfo": { > "containerLogInfo": [ > { > "fileName": "prelaunch.out", > "fileSize": "100", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "directory.info", > "fileSize": "2296", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "stderr", > "fileSize": "1722", > "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021" > }, > { > "fileName": "prelaunch.err", > "fileSize": "0", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "stdout", > "fileSize": "0", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > }, > { > "fileName": "syslog", > "fileSize": "38551", > "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021" > }, > { > "fileName": "launch_container.sh", > "fileSize": "5013", > "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021" > } > ], > "logAggregationType": "AGGREGATED", > "containerId": "container_1638174027957_0008_01_01", > "nodeId": "da175178c179:43977" > } > }{noformat} > As for applications with multiple containers it looks like: > {noformat} > { > "containerLogsInfo": [{ > > }, { }] > }{noformat} > We can not change the response of the endpoint due to backward compatibility, > therefore we need to make UI2 be able to handle both scenarios. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11030) ClassNotFoundException when aux service class is loaded from customized classpath
[ https://issues.apache.org/jira/browse/YARN-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454291#comment-17454291 ] Hiroyuki Adachi commented on YARN-11030: [~prabhujoseph] Thank you for your reply. I checked that is a same issue. Actually, it is applicable for not only jars on HDFS but also jars on local filesystem. > ClassNotFoundException when aux service class is loaded from customized > classpath > - > > Key: YARN-11030 > URL: https://issues.apache.org/jira/browse/YARN-11030 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0, 3.3.1 >Reporter: Hiroyuki Adachi >Priority: Minor > > NodeManager failed to load the aux service with ClassNotFoundException while > loading the class from the customized classpath. > {noformat} > > > value="org.apache.spark.network.yarn.YarnShuffleService"/> > value="/tmp/spark-3.1.2-yarn-shuffle.jar"/> > > {noformat} > {noformat} > 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [file:/tmp/spark-3.1.2-yarn-shuffle.jar] > 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [org.apache.spark.network.yarn.YarnShuffleService] > 2021-12-06 15:32:09,169 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in > state INITED > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:482) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:761) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:327) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:494) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.ja > va:165) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxServiceFromLocalClasspath(AuxServices.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxService(AuxServices.java:271) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:452) > ... 10 more > 2021-12-06 15:32:09,172 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > > failed in state INITED{noformat} > > YARN-9075 may cause this problem. The default system classes were changed by > this patch. > Before YARN-9075: isSystemClass() returns false since the system classes does > not contain the aux service class itself, and the class will be loaded from > the customized classpath. > [https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java#L176] > {noformat} > 2021-12-06 15:50:21,332 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath:
[jira] [Updated] (YARN-11028) Add metrics for container allocation latency
[ https://issues.apache.org/jira/browse/YARN-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11028: -- Labels: pull-request-available (was: ) > Add metrics for container allocation latency > > > Key: YARN-11028 > URL: https://issues.apache.org/jira/browse/YARN-11028 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11031) Improve the maintainability of RM webapp tests like TestRMWebServicesCapacitySched
[ https://issues.apache.org/jira/browse/YARN-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11031: -- Labels: pull-request-available (was: ) > Improve the maintainability of RM webapp tests like > TestRMWebServicesCapacitySched > -- > > Key: YARN-11031 > URL: https://issues.apache.org/jira/browse/YARN-11031 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It's hard to maintain the asserts in TestRMWebServicesCapacitySched, > TestRMWebServicesCapacitySchedDynamicConfig test classes when the scheduler > response is modified. Currently only a subset of the scheduler response is > asserted in these tests. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11032) Fix Wrong format in RMAPPImpl
[ https://issues.apache.org/jira/browse/YARN-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11032: -- Labels: pull-request-available (was: ) > Fix Wrong format in RMAPPImpl > - > > Key: YARN-11032 > URL: https://issues.apache.org/jira/browse/YARN-11032 > Project: Hadoop YARN > Issue Type: Improvement > Components: RM >Affects Versions: 3.3.1 >Reporter: guophilipse >Assignee: guophilipse >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Fix Wrong format in RMAPPImpl -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11032) Fix Wrong format in RMAPPImpl
guophilipse created YARN-11032: -- Summary: Fix Wrong format in RMAPPImpl Key: YARN-11032 URL: https://issues.apache.org/jira/browse/YARN-11032 Project: Hadoop YARN Issue Type: Improvement Components: RM Affects Versions: 3.3.1 Reporter: guophilipse Assignee: guophilipse Fix Wrong format in RMAPPImpl -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch
[ https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453965#comment-17453965 ] Ashutosh Gupta commented on YARN-8234: -- Hi [~ziqian hu], wanted to check with you if you are still working on it. Otherwise, I can plan to take it up and raise the PR. > Improve RM system metrics publisher's performance by pushing events to > timeline server in batch > --- > > Key: YARN-8234 > URL: https://issues.apache.org/jira/browse/YARN-8234 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, timelineserver >Affects Versions: 2.8.3 >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Critical > Attachments: YARN-8234-branch-2.8.3.001.patch, > YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch, > YARN-8234-branch-2.8.3.004.patch, YARN-8234.001.patch, YARN-8234.002.patch, > YARN-8234.003.patch, YARN-8234.004.patch > > > When system metrics publisher is enabled, RM will push events to timeline > server via restful api. If the cluster load is heavy, many events are sent to > timeline server and the timeline server's event handler thread locked. > YARN-7266 talked about the detail of this problem. Because of the lock, > timeline server can't receive event as fast as it generated in RM and lots of > timeline event stays in RM's memory. Finally, those events will consume all > RM's memory and RM will start a full gc (which cause an JVM stop-world and > cause a timeout from rm to zookeeper) or even get an OOM. > The main problem here is that timeline can't receive timeline server's event > as fast as it generated. Now, RM system metrics publisher put only one event > in a request, and most time costs on handling http header or some thing about > the net connection on timeline side. Only few time is spent on dealing with > the timeline event which is truly valuable. > In this issue, we add a buffer in system metrics publisher and let publisher > send events to timeline server in batch via one request. When sets the batch > size to 1000, in out experiment the speed of the timeline server receives > events has 100x improvement. We have implement this function int our product > environment which accepts 2 app's in one hour and it works fine. > We add following configuration: > * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of > system metrics publisher sending events in one request. Default value is 1000 > * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the > event buffer in system metrics publisher. > * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When > enable batch publishing, we must avoid that the publisher waits for a batch > to be filled up and hold events in buffer for long time. So we add another > thread which send event's in the buffer periodically. This config sets the > interval of the cyclical sending thread. The default value is 60s. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9063) ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing
[ https://issues.apache.org/jira/browse/YARN-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved YARN-9063. - Fix Version/s: 3.4.0 2.10.2 3.2.4 3.3.3 Resolution: Fixed Committed to trunk, branch-3.3, branch-3.2, and branch-2.10. Thank you [~tarunparimi] for your report and thanks [~groot] for your contribution. > ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing > - > > Key: YARN-9063 > URL: https://issues.apache.org/jira/browse/YARN-9063 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, timelineservice >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Ashutosh Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.3 > > Time Spent: 5.5h > Remaining Estimate: 0h > > ATS v1.5 fails to start up if there are some missing files in > RollingLevelDBTimelineStore. YARN-6054 fixes this issue only for the > LevelDBTimelineStore. Since RollingLevelDBTimelineStore opens multiple level > db and rolls them, we need a separate fix for this. The error is shown below > {code} > 18/11/13 07:00:56 FATAL applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 > missing files; e.g.: > /tmp/ats_folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:111) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:174) > > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:184) > > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 1 missing files; e.g.: > /tmp/ats-folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst > at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:321) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9063) ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing
[ https://issues.apache.org/jira/browse/YARN-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reassigned YARN-9063: --- Assignee: Ashutosh Gupta (was: Tarun Parimi) > ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing > - > > Key: YARN-9063 > URL: https://issues.apache.org/jira/browse/YARN-9063 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver, timelineservice >Affects Versions: 2.8.0 >Reporter: Tarun Parimi >Assignee: Ashutosh Gupta >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > > ATS v1.5 fails to start up if there are some missing files in > RollingLevelDBTimelineStore. YARN-6054 fixes this issue only for the > LevelDBTimelineStore. Since RollingLevelDBTimelineStore opens multiple level > db and rolls them, we need a separate fix for this. The error is shown below > {code} > 18/11/13 07:00:56 FATAL applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 > missing files; e.g.: > /tmp/ats_folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:111) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:174) > > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:184) > > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 1 missing files; e.g.: > /tmp/ats-folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst > at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:321) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11030) ClassNotFoundException when aux service class is loaded from customized classpath
[ https://issues.apache.org/jira/browse/YARN-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453909#comment-17453909 ] Prabhu Joseph commented on YARN-11030: -- [~hadachi] Thanks for reporting the issue. Looks this is a duplicate of [YARN-9967|https://issues.apache.org/jira/browse/YARN-9967]. Can you confirm the same. Thanks. > ClassNotFoundException when aux service class is loaded from customized > classpath > - > > Key: YARN-11030 > URL: https://issues.apache.org/jira/browse/YARN-11030 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0, 3.3.1 >Reporter: Hiroyuki Adachi >Priority: Minor > > NodeManager failed to load the aux service with ClassNotFoundException while > loading the class from the customized classpath. > {noformat} > > > value="org.apache.spark.network.yarn.YarnShuffleService"/> > value="/tmp/spark-3.1.2-yarn-shuffle.jar"/> > > {noformat} > {noformat} > 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [file:/tmp/spark-3.1.2-yarn-shuffle.jar] > 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [org.apache.spark.network.yarn.YarnShuffleService] > 2021-12-06 15:32:09,169 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in > state INITED > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:482) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:761) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:327) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:494) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.network.yarn.YarnShuffleService > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.ja > va:165) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxServiceFromLocalClasspath(AuxServices.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxService(AuxServices.java:271) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:452) > ... 10 more > 2021-12-06 15:32:09,172 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > > failed in state INITED{noformat} > > YARN-9075 may cause this problem. The default system classes were changed by > this patch. > Before YARN-9075: isSystemClass() returns false since the system classes does > not contain the aux service class itself, and the class will be loaded from > the customized classpath. > [https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java#L176] > {noformat} > 2021-12-06 15:50:21,332 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: