[jira] [Comment Edited] (YARN-11020) [UI2] No container is found for an application attempt with a single AM container

2021-12-06 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454442#comment-17454442
 ] 

Szilard Nemeth edited comment on YARN-11020 at 12/7/21, 7:53 AM:
-

Thanks [~adam.antal] for chiming in.
I agree with you that I'd be good to fix this on the API side but unfortunately 
it would break backward compatibility so I think we need to live with this.
Just committed the fix from [~gandras]


was (Author: snemeth):
Thanks [~adam.antal] for chiming in.
I agree with you that I'd be good to fix this on the API side but unfortunately 
it would break backward compatibility so I think we need to live with this.


> [UI2] No container is found for an application attempt with a single AM 
> container
> -
>
> Key: YARN-11020
> URL: https://issues.apache.org/jira/browse/YARN-11020
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In UI2 for an application under the Logs tab, No container data available 
> message is shown if the application attempt only submitted a single container 
> (which is the AM container). 
> The culprit of the issue is that the response from YARN is not consistent, 
> because for a single container it looks like:
> {noformat}
> {
>     "containerLogsInfo": {
>         "containerLogInfo": [
>             {
>                 "fileName": "prelaunch.out",
>                 "fileSize": "100",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "directory.info",
>                 "fileSize": "2296",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stderr",
>                 "fileSize": "1722",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "prelaunch.err",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stdout",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "syslog",
>                 "fileSize": "38551",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "launch_container.sh",
>                 "fileSize": "5013",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             }
>         ],
>         "logAggregationType": "AGGREGATED",
>         "containerId": "container_1638174027957_0008_01_01",
>         "nodeId": "da175178c179:43977"
>     }
> }{noformat}
> As for applications with multiple containers it looks like:
> {noformat}
> {
>     "containerLogsInfo": [{
>         
>     }, {  }]
> }{noformat}
> We can not change the response of the endpoint due to backward compatibility, 
> therefore we need to make UI2 be able to handle both scenarios.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11020) [UI2] No container is found for an application attempt with a single AM container

2021-12-06 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454442#comment-17454442
 ] 

Szilard Nemeth commented on YARN-11020:
---

Thanks [~adam.antal] for chiming in.
I agree with you that I'd be good to fix this on the API side but unfortunately 
it would break backward compatibility so I think we need to live with this.


> [UI2] No container is found for an application attempt with a single AM 
> container
> -
>
> Key: YARN-11020
> URL: https://issues.apache.org/jira/browse/YARN-11020
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In UI2 for an application under the Logs tab, No container data available 
> message is shown if the application attempt only submitted a single container 
> (which is the AM container). 
> The culprit of the issue is that the response from YARN is not consistent, 
> because for a single container it looks like:
> {noformat}
> {
>     "containerLogsInfo": {
>         "containerLogInfo": [
>             {
>                 "fileName": "prelaunch.out",
>                 "fileSize": "100",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "directory.info",
>                 "fileSize": "2296",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stderr",
>                 "fileSize": "1722",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "prelaunch.err",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stdout",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "syslog",
>                 "fileSize": "38551",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "launch_container.sh",
>                 "fileSize": "5013",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             }
>         ],
>         "logAggregationType": "AGGREGATED",
>         "containerId": "container_1638174027957_0008_01_01",
>         "nodeId": "da175178c179:43977"
>     }
> }{noformat}
> As for applications with multiple containers it looks like:
> {noformat}
> {
>     "containerLogsInfo": [{
>         
>     }, {  }]
> }{noformat}
> We can not change the response of the endpoint due to backward compatibility, 
> therefore we need to make UI2 be able to handle both scenarios.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11020) [UI2] No container is found for an application attempt with a single AM container

2021-12-06 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-11020.
---
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> [UI2] No container is found for an application attempt with a single AM 
> container
> -
>
> Key: YARN-11020
> URL: https://issues.apache.org/jira/browse/YARN-11020
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In UI2 for an application under the Logs tab, No container data available 
> message is shown if the application attempt only submitted a single container 
> (which is the AM container). 
> The culprit of the issue is that the response from YARN is not consistent, 
> because for a single container it looks like:
> {noformat}
> {
>     "containerLogsInfo": {
>         "containerLogInfo": [
>             {
>                 "fileName": "prelaunch.out",
>                 "fileSize": "100",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "directory.info",
>                 "fileSize": "2296",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stderr",
>                 "fileSize": "1722",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "prelaunch.err",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stdout",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "syslog",
>                 "fileSize": "38551",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "launch_container.sh",
>                 "fileSize": "5013",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             }
>         ],
>         "logAggregationType": "AGGREGATED",
>         "containerId": "container_1638174027957_0008_01_01",
>         "nodeId": "da175178c179:43977"
>     }
> }{noformat}
> As for applications with multiple containers it looks like:
> {noformat}
> {
>     "containerLogsInfo": [{
>         
>     }, {  }]
> }{noformat}
> We can not change the response of the endpoint due to backward compatibility, 
> therefore we need to make UI2 be able to handle both scenarios.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11030) ClassNotFoundException when aux service class is loaded from customized classpath

2021-12-06 Thread Hiroyuki Adachi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454291#comment-17454291
 ] 

Hiroyuki Adachi commented on YARN-11030:


[~prabhujoseph] Thank you for your reply. I checked that is a same issue. 
Actually, it is applicable for not only jars on HDFS but also jars on local 
filesystem.

> ClassNotFoundException when aux service class is loaded from customized 
> classpath
> -
>
> Key: YARN-11030
> URL: https://issues.apache.org/jira/browse/YARN-11030
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Hiroyuki Adachi
>Priority: Minor
>
> NodeManager failed to load the aux service with ClassNotFoundException while 
> loading the class from the customized classpath.
> {noformat}
> 
>   
>    value="org.apache.spark.network.yarn.YarnShuffleService"/>
>    value="/tmp/spark-3.1.2-yarn-shuffle.jar"/>
>   
>  {noformat}
> {noformat}
> 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: [file:/tmp/spark-3.1.2-yarn-shuffle.jar]
> 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [org.apache.spark.network.yarn.YarnShuffleService]
> 2021-12-06 15:32:09,169 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in
>  state INITED
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:482)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:761)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:327)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:494)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>         at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>         at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:348)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.ja
> va:165)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxServiceFromLocalClasspath(AuxServices.java:242)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxService(AuxServices.java:271)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:452)
>         ... 10 more
> 2021-12-06 15:32:09,172 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  
> failed in state INITED{noformat}
>  
> YARN-9075 may cause this problem. The default system classes were changed by 
> this patch.
> Before YARN-9075: isSystemClass() returns false since the system classes does 
> not contain the aux service class itself, and the class will be loaded from 
> the customized classpath.
> [https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java#L176]
> {noformat}
> 2021-12-06 15:50:21,332 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: 

[jira] [Updated] (YARN-11028) Add metrics for container allocation latency

2021-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11028:
--
Labels: pull-request-available  (was: )

> Add metrics for container allocation latency
> 
>
> Key: YARN-11028
> URL: https://issues.apache.org/jira/browse/YARN-11028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11031) Improve the maintainability of RM webapp tests like TestRMWebServicesCapacitySched

2021-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11031:
--
Labels: pull-request-available  (was: )

> Improve the maintainability of RM webapp tests like 
> TestRMWebServicesCapacitySched
> --
>
> Key: YARN-11031
> URL: https://issues.apache.org/jira/browse/YARN-11031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's hard to maintain the asserts in TestRMWebServicesCapacitySched, 
> TestRMWebServicesCapacitySchedDynamicConfig test classes when the scheduler 
> response is modified. Currently only a subset of the scheduler response is 
> asserted in these tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11032) Fix Wrong format in RMAPPImpl

2021-12-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11032:
--
Labels: pull-request-available  (was: )

> Fix Wrong format in RMAPPImpl
> -
>
> Key: YARN-11032
> URL: https://issues.apache.org/jira/browse/YARN-11032
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: guophilipse
>Assignee: guophilipse
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Fix Wrong format in RMAPPImpl



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11032) Fix Wrong format in RMAPPImpl

2021-12-06 Thread guophilipse (Jira)
guophilipse created YARN-11032:
--

 Summary: Fix Wrong format in RMAPPImpl
 Key: YARN-11032
 URL: https://issues.apache.org/jira/browse/YARN-11032
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: RM
Affects Versions: 3.3.1
Reporter: guophilipse
Assignee: guophilipse


Fix Wrong format in RMAPPImpl



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8234) Improve RM system metrics publisher's performance by pushing events to timeline server in batch

2021-12-06 Thread Ashutosh Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453965#comment-17453965
 ] 

Ashutosh Gupta commented on YARN-8234:
--

Hi [~ziqian hu], wanted to check with you if you are still working on it. 
Otherwise, I can plan to take it up and raise the PR.

> Improve RM system metrics publisher's performance by pushing events to 
> timeline server in batch
> ---
>
> Key: YARN-8234
> URL: https://issues.apache.org/jira/browse/YARN-8234
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.8.3
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Critical
> Attachments: YARN-8234-branch-2.8.3.001.patch, 
> YARN-8234-branch-2.8.3.002.patch, YARN-8234-branch-2.8.3.003.patch, 
> YARN-8234-branch-2.8.3.004.patch, YARN-8234.001.patch, YARN-8234.002.patch, 
> YARN-8234.003.patch, YARN-8234.004.patch
>
>
> When system metrics publisher is enabled, RM will push events to timeline 
> server via restful api. If the cluster load is heavy, many events are sent to 
> timeline server and the timeline server's event handler thread locked. 
> YARN-7266 talked about the detail of this problem. Because of the lock, 
> timeline server can't receive event as fast as it generated in RM and lots of 
> timeline event stays in RM's memory. Finally, those events will consume all 
> RM's memory and RM will start a full gc (which cause an JVM stop-world and 
> cause a timeout from rm to zookeeper) or even get an OOM. 
> The main problem here is that timeline can't receive timeline server's event 
> as fast as it generated. Now, RM system metrics publisher put only one event 
> in a request, and most time costs on handling http header or some thing about 
> the net connection on timeline side. Only few time is spent on dealing with 
> the timeline event which is truly valuable.
> In this issue, we add a buffer in system metrics publisher and let publisher 
> send events to timeline server in batch via one request. When sets the batch 
> size to 1000, in out experiment the speed of the timeline server receives 
> events has 100x improvement. We have implement this function int our product 
> environment which accepts 2 app's in one hour and it works fine.
> We add following configuration:
>  * yarn.resourcemanager.system-metrics-publisher.batch-size: the size of 
> system metrics publisher sending events in one request. Default value is 1000
>  * yarn.resourcemanager.system-metrics-publisher.buffer-size: the size of the 
> event buffer in system metrics publisher.
>  * yarn.resourcemanager.system-metrics-publisher.interval-seconds: When 
> enable batch publishing, we must avoid that the publisher waits for a batch 
> to be filled up and hold events in buffer for long time. So we add another 
> thread which send event's in the buffer periodically. This config sets the 
> interval of the cyclical sending thread. The default value is 60s.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9063) ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing

2021-12-06 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-9063.
-
Fix Version/s: 3.4.0
   2.10.2
   3.2.4
   3.3.3
   Resolution: Fixed

Committed to trunk, branch-3.3, branch-3.2, and branch-2.10. Thank you 
[~tarunparimi] for your report and thanks [~groot] for your contribution.

> ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing
> -
>
> Key: YARN-9063
> URL: https://issues.apache.org/jira/browse/YARN-9063
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver, timelineservice
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.3
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> ATS v1.5 fails to start up if there are some missing files in 
> RollingLevelDBTimelineStore. YARN-6054 fixes this issue only for the 
> LevelDBTimelineStore. Since RollingLevelDBTimelineStore opens multiple level 
> db and rolls them, we need a separate fix for this. The error is shown below
> {code}
> 18/11/13 07:00:56 FATAL applicationhistoryservice.ApplicationHistoryServer: 
> Error starting ApplicationHistoryServer 
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 
> missing files; e.g.: 
> /tmp/ats_folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) 
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>  
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:111)
>  
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:174)
>  
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:184)
>  
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 1 missing files; e.g.: 
> /tmp/ats-folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst 
> at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) 
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) 
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) 
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:321)
>  
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9063) ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing

2021-12-06 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reassigned YARN-9063:
---

Assignee: Ashutosh Gupta  (was: Tarun Parimi)

> ATS 1.5 fails to start if RollingLevelDb files are corrupt or missing
> -
>
> Key: YARN-9063
> URL: https://issues.apache.org/jira/browse/YARN-9063
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver, timelineservice
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> ATS v1.5 fails to start up if there are some missing files in 
> RollingLevelDBTimelineStore. YARN-6054 fixes this issue only for the 
> LevelDBTimelineStore. Since RollingLevelDBTimelineStore opens multiple level 
> db and rolls them, we need a separate fix for this. The error is shown below
> {code}
> 18/11/13 07:00:56 FATAL applicationhistoryservice.ApplicationHistoryServer: 
> Error starting ApplicationHistoryServer 
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 
> missing files; e.g.: 
> /tmp/ats_folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) 
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>  
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:111)
>  
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:174)
>  
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:184)
>  
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 1 missing files; e.g.: 
> /tmp/ats-folder/yarn/timeline/leveldb-timeline-store/owner-ldb/05.sst 
> at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) 
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) 
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) 
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.serviceInit(RollingLevelDBTimelineStore.java:321)
>  
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11030) ClassNotFoundException when aux service class is loaded from customized classpath

2021-12-06 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453909#comment-17453909
 ] 

Prabhu Joseph commented on YARN-11030:
--

[~hadachi] Thanks for reporting the issue. Looks this is a duplicate of 
[YARN-9967|https://issues.apache.org/jira/browse/YARN-9967]. Can you confirm 
the same. Thanks.

> ClassNotFoundException when aux service class is loaded from customized 
> classpath
> -
>
> Key: YARN-11030
> URL: https://issues.apache.org/jira/browse/YARN-11030
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0, 3.3.1
>Reporter: Hiroyuki Adachi
>Priority: Minor
>
> NodeManager failed to load the aux service with ClassNotFoundException while 
> loading the class from the customized classpath.
> {noformat}
> 
>   
>    value="org.apache.spark.network.yarn.YarnShuffleService"/>
>    value="/tmp/spark-3.1.2-yarn-shuffle.jar"/>
>   
>  {noformat}
> {noformat}
> 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: [file:/tmp/spark-3.1.2-yarn-shuffle.jar]
> 2021-12-06 15:32:09,168 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [org.apache.spark.network.yarn.YarnShuffleService]
> 2021-12-06 15:32:09,169 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in
>  state INITED
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:482)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:761)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:327)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:494)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:962)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1042)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.network.yarn.YarnShuffleService
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>         at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>         at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:348)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.ja
> va:165)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxServiceFromLocalClasspath(AuxServices.java:242)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.createAuxService(AuxServices.java:271)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.initAuxService(AuxServices.java:452)
>         ... 10 more
> 2021-12-06 15:32:09,172 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  
> failed in state INITED{noformat}
>  
> YARN-9075 may cause this problem. The default system classes were changed by 
> this patch.
> Before YARN-9075: isSystemClass() returns false since the system classes does 
> not contain the aux service class itself, and the class will be loaded from 
> the customized classpath.
> [https://github.com/apache/hadoop/blob/rel/release-3.3.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ApplicationClassLoader.java#L176]
> {noformat}
> 2021-12-06 15:50:21,332 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: