[
https://issues.apache.org/jira/browse/YARN-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345553#comment-16345553
]
Rohith Sharma K S commented on YARN-7843:
-----------------------------------------
I just deployed new cluster in yesterday trunk build and reverted YARN-2185. It
is NOT blocker but some of the containers start failed with NPE. Typically we
configure maximum attempts as 20.
{noformat}
2018-01-30 17:43:42,756 INFO localizer.ResourceLocalizationService
(ResourceLocalizationService.java:handle(791)) - Created localizer for
container_1517329095523_0009_18_000001
2018-01-30 17:43:42,758 INFO localizer.ResourceLocalizationService
(ResourceLocalizationService.java:writeCredentials(1322)) - Writing credentials
to the nmPrivate file
/grid/0/hadoop/yarn/local/nmPrivate/container_1517329095523_0009_18_000001.tokens
2018-01-30 17:43:44,993 WARN localizer.ResourceLocalizationService
(ResourceLocalizationService.java:processHeartbeat(1114)) - {
hdfs://mycluster:9820/user/ambari-qa/.staging/job_1517329095523_0009/job.jar,
1517334174295, PATTERN, (?:classes/|lib/).* } failed:
org.apache.hadoop.util.RunJar.unJarAndSave(Ljava/io/InputStream;Ljava/io/File;Ljava/lang/String;Ljava/util/regex/Pattern;)V
java.lang.NoSuchMethodError:
org.apache.hadoop.util.RunJar.unJarAndSave(Ljava/io/InputStream;Ljava/io/File;Ljava/lang/String;Ljava/util/regex/Pattern;)V
at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:354)
at
org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:303)
at
org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:283)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-01-30 17:43:44,993 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_1517329095523_0009_18_000001 transitioned from LOCALIZING to
LOCALIZATION_FAILED
2018-01-30 17:43:44,993 INFO localizer.LocalResourcesTrackerImpl
(LocalResourcesTrackerImpl.java:handle(160)) - Container
container_1517329095523_0009_18_000001 sent RELEASE event on a resource request
{ hdfs://mycluster:9820/user/ambari-qa/.staging/job_1517329095523_0009/job.jar,
1517334174295, PATTERN, (?:classes/|lib/).* } not present in cache.
2018-01-30 17:43:44,994 INFO localizer.ResourceLocalizationService
(ResourceLocalizationService.java:cleanupPrivLocalizers(812)) - Interrupting
localizer for container_1517329095523_0009_18_000001
2018-01-30 17:43:44,994 WARN ipc.Server (Server.java:logException(2717)) - IPC
Server handler 2 on 8040, call Call#9 Retry#0
org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB.heartbeat
from 172.27.26.136:39992
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1189)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1153)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:753)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:371)
at
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
2018-01-30 17:43:44,994 INFO localizer.ResourceLocalizationService
(ResourceLocalizationService.java:interrupt(1203)) - Destroying localization
shell process for container_1517329095523_0009_18_000001
2018-01-30 17:43:44,995 INFO container.ContainerImpl
(ContainerImpl.java:handle(2108)) - Container
container_1517329095523_0009_18_000001 transitioned from LOCALIZATION_FAILED to
DONE
2018-01-30 17:43:44,995 INFO application.ApplicationImpl
(ApplicationImpl.java:transition(489)) - Removing
container_1517329095523_0009_18_000001 from application
application_1517329095523_0009
2018-01-30 17:43:44,995 INFO monitor.ContainersMonitorImpl
(ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping
resource-monitoring for container_1517329095523_0009_18_000001
{noformat}
> Container Localizer is failing with NPE
> ---------------------------------------
>
> Key: YARN-7843
> URL: https://issues.apache.org/jira/browse/YARN-7843
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Rohith Sharma K S
> Priority: Blocker
>
> It is seen that container localizer are failing with NPE, as result none of
> container are getting launched!
> {noformat}
> Caused by: java.lang.NullPointerException: java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1189)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1153)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:753)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:371)
> at
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]