[ 
https://issues.apache.org/jira/browse/YARN-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345553#comment-16345553
 ] 

Rohith Sharma K S commented on YARN-7843:
-----------------------------------------

I just deployed new cluster in yesterday trunk build and reverted YARN-2185. It 
is NOT blocker but some of the containers start failed with NPE. Typically we 
configure maximum attempts as 20. 

{noformat}
2018-01-30 17:43:42,756 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:handle(791)) - Created localizer for 
container_1517329095523_0009_18_000001
2018-01-30 17:43:42,758 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:writeCredentials(1322)) - Writing credentials 
to the nmPrivate file 
/grid/0/hadoop/yarn/local/nmPrivate/container_1517329095523_0009_18_000001.tokens
2018-01-30 17:43:44,993 WARN  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:processHeartbeat(1114)) - { 
hdfs://mycluster:9820/user/ambari-qa/.staging/job_1517329095523_0009/job.jar, 
1517334174295, PATTERN, (?:classes/|lib/).* } failed: 
org.apache.hadoop.util.RunJar.unJarAndSave(Ljava/io/InputStream;Ljava/io/File;Ljava/lang/String;Ljava/util/regex/Pattern;)V
java.lang.NoSuchMethodError: 
org.apache.hadoop.util.RunJar.unJarAndSave(Ljava/io/InputStream;Ljava/io/File;Ljava/lang/String;Ljava/util/regex/Pattern;)V
        at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:354)
        at 
org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:303)
        at 
org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:283)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-01-30 17:43:44,993 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_1517329095523_0009_18_000001 transitioned from LOCALIZING to 
LOCALIZATION_FAILED
2018-01-30 17:43:44,993 INFO  localizer.LocalResourcesTrackerImpl 
(LocalResourcesTrackerImpl.java:handle(160)) - Container 
container_1517329095523_0009_18_000001 sent RELEASE event on a resource request 
{ hdfs://mycluster:9820/user/ambari-qa/.staging/job_1517329095523_0009/job.jar, 
1517334174295, PATTERN, (?:classes/|lib/).* } not present in cache.
2018-01-30 17:43:44,994 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:cleanupPrivLocalizers(812)) - Interrupting 
localizer for container_1517329095523_0009_18_000001
2018-01-30 17:43:44,994 WARN  ipc.Server (Server.java:logException(2717)) - IPC 
Server handler 2 on 8040, call Call#9 Retry#0 
org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB.heartbeat 
from 172.27.26.136:39992
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1189)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1153)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:753)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:371)
        at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
        at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
2018-01-30 17:43:44,994 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:interrupt(1203)) - Destroying localization 
shell process for container_1517329095523_0009_18_000001
2018-01-30 17:43:44,995 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_1517329095523_0009_18_000001 transitioned from LOCALIZATION_FAILED to 
DONE
2018-01-30 17:43:44,995 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(489)) - Removing 
container_1517329095523_0009_18_000001 from application 
application_1517329095523_0009
2018-01-30 17:43:44,995 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping 
resource-monitoring for container_1517329095523_0009_18_000001
{noformat}

> Container Localizer is failing with NPE
> ---------------------------------------
>
>                 Key: YARN-7843
>                 URL: https://issues.apache.org/jira/browse/YARN-7843
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Rohith Sharma K S
>            Priority: Blocker
>
> It is seen that container localizer are failing with NPE, as result none of 
> container are getting launched!
> {noformat}
> Caused by: java.lang.NullPointerException: java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:503)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1189)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1153)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:753)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:371)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to