[ 
https://issues.apache.org/jira/browse/YARN-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719066#comment-13719066
 ] 

Alejandro Abdelnur commented on YARN-960:
-----------------------------------------

LGTM. Still with this patch I cannot get the pi example to work in a speudo 
setup, the localization of the AM is failing with:

{code}
2013-07-24 16:58:19,057 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for appattempt_1374710243541_0001_000002 (auth:SIMPLE)
2013-07-24 16:58:19,061 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1374710243541_0001_02_000001 by user tucu
2013-07-24 16:58:19,061 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=tucu 
IP=172.21.3.149 OPERATION=Start Container Request       
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1374710243541_0001    
CONTAINERID=container_1374710243541_0001_02_000001
2013-07-24 16:58:19,061 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Adding container_1374710243541_0001_02_000001 to application 
application_1374710243541_0001
2013-07-24 16:58:19,062 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1374710243541_0001_02_000001 transitioned from NEW to 
LOCALIZING
2013-07-24 16:58:19,062 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar
 transitioned from INIT to DOWNLOADING
2013-07-24 16:58:19,062 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1374710243541_0001_02_000001
2013-07-24 16:58:19,109 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Writing credentials to the nmPrivate file 
/tmp/hadoop-tucu/nm-local-dir/nmPrivate/container_1374710243541_0001_02_000001.tokens.
 Credentials list: 
2013-07-24 16:58:19,130 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
Initializing user tucu
2013-07-24 16:58:19,255 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
from 
/tmp/hadoop-tucu/nm-local-dir/nmPrivate/container_1374710243541_0001_02_000001.tokens
 to 
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/container_1374710243541_0001_02_000001.tokens
2013-07-24 16:58:19,256 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to 
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001
 = 
file:/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001
2013-07-24 16:58:19,691 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 DEBUG: FAILED { 
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar,
 1374710294773, PATTERN, (?:classes/|lib/).* }, rename destination 
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/filecache/12
 already exists.
2013-07-24 16:58:19,692 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar
 transitioned from DOWNLOADING to FAILED
2013-07-24 16:58:19,692 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1374710243541_0001_02_000001 transitioned from LOCALIZING 
to LOCALIZATION_FAILED
2013-07-24 16:58:19,693 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
 Container container_1374710243541_0001_02_000001 sent RELEASE event on a 
resource request { 
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar,
 1374710294773, PATTERN, (?:classes/|lib/).* } not present in cache.
2013-07-24 16:58:19,694 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
absolute path : 
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/container_1374710243541_0001_02_000001
2013-07-24 16:58:19,694 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Unknown localizer with localizerId container_1374710243541_0001_02_000001 is 
sending heartbeat. Ordering it to DIE
2013-07-24 16:58:19,694 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete 
returned false for path: 
[/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/container_1374710243541_0001_02_000001]
2013-07-24 16:58:19,694 WARN 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=tucu 
OPERATION=Container Finished - Failed   TARGET=ContainerImpl    RESULT=FAILURE  
DESCRIPTION=Container failed with state: LOCALIZATION_FAILED    
APPID=application_1374710243541_0001    
CONTAINERID=container_1374710243541_0001_02_000001
2013-07-24 16:58:19,694 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1374710243541_0001_02_000001 transitioned from 
LOCALIZATION_FAILED to DONE
2013-07-24 16:58:19,695 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Removing container_1374710243541_0001_02_000001 from application 
application_1374710243541_0001
2013-07-24 16:58:19,695 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 ResourceCalculatorPlugin is unavailable on this system. 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 is disabled.
2013-07-24 16:58:19,695 WARN org.apache.hadoop.ipc.Client: interrupted waiting 
to send rpc request to server
java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279)
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048)
        at org.apache.hadoop.ipc.Client.call(Client.java:1401)
        at org.apache.hadoop.ipc.Client.call(Client.java:1381)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy25.heartbeat(Unknown Source)
        at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:250)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:980)
2013-07-24 16:58:20,050 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
status for container: container_id {, app_attempt_id {, application_id {, id: 
1, cluster_timestamp: 1374710243541, }, attemptId: 2, }, id: 1, }, state: 
C_COMPLETE, diagnostics: "rename destination 
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/filecache/12
 already exists.\n", exit_status: -1000, 
2013-07-24 16:58:20,050 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
completed container container_1374710243541_0001_02_000001
2013-07-24 16:58:21,057 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1374710243541_0001 transitioned from RUNNING to 
APPLICATION_RESOURCES_CLEANINGUP
2013-07-24 16:58:21,058 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
absolute path : 
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001
2013-07-24 16:58:21,058 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event APPLICATION_STOP for appId application_1374710243541_0001
2013-07-24 16:58:21,061 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1374710243541_0001 transitioned from 
APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2013-07-24 16:58:21,061 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
 Scheduling Log Deletion for application: application_1374710243541_0001, with 
delay of 10800 seconds
{code}

Wonder if this is related to the fallout due to token changes.
                
> TestMRCredentials and  TestBinaryTokenFile are failing on trunk
> ---------------------------------------------------------------
>
>                 Key: YARN-960
>                 URL: https://issues.apache.org/jira/browse/YARN-960
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Alejandro Abdelnur
>            Assignee: Daryn Sharp
>            Priority: Blocker
>             Fix For: 2.1.0-beta
>
>         Attachments: YARN-960.patch
>
>
> Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
> Making it a blocker until full impact of the issue is scoped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to