[
https://issues.apache.org/jira/browse/YARN-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719066#comment-13719066
]
Alejandro Abdelnur commented on YARN-960:
-----------------------------------------
LGTM. Still with this patch I cannot get the pi example to work in a speudo
setup, the localization of the AM is failing with:
{code}
2013-07-24 16:58:19,057 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth
successful for appattempt_1374710243541_0001_000002 (auth:SIMPLE)
2013-07-24 16:58:19,061 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Start request for container_1374710243541_0001_02_000001 by user tucu
2013-07-24 16:58:19,061 INFO
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=tucu
IP=172.21.3.149 OPERATION=Start Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1374710243541_0001
CONTAINERID=container_1374710243541_0001_02_000001
2013-07-24 16:58:19,061 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1374710243541_0001_02_000001 to application
application_1374710243541_0001
2013-07-24 16:58:19,062 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1374710243541_0001_02_000001 transitioned from NEW to
LOCALIZING
2013-07-24 16:58:19,062 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar
transitioned from INIT to DOWNLOADING
2013-07-24 16:58:19,062 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Created localizer for container_1374710243541_0001_02_000001
2013-07-24 16:58:19,109 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Writing credentials to the nmPrivate file
/tmp/hadoop-tucu/nm-local-dir/nmPrivate/container_1374710243541_0001_02_000001.tokens.
Credentials list:
2013-07-24 16:58:19,130 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Initializing user tucu
2013-07-24 16:58:19,255 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
from
/tmp/hadoop-tucu/nm-local-dir/nmPrivate/container_1374710243541_0001_02_000001.tokens
to
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/container_1374710243541_0001_02_000001.tokens
2013-07-24 16:58:19,256 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001
=
file:/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001
2013-07-24 16:58:19,691 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
DEBUG: FAILED {
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar,
1374710294773, PATTERN, (?:classes/|lib/).* }, rename destination
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/filecache/12
already exists.
2013-07-24 16:58:19,692 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar
transitioned from DOWNLOADING to FAILED
2013-07-24 16:58:19,692 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1374710243541_0001_02_000001 transitioned from LOCALIZING
to LOCALIZATION_FAILED
2013-07-24 16:58:19,693 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
Container container_1374710243541_0001_02_000001 sent RELEASE event on a
resource request {
hdfs://localhost:8020/tmp/hadoop-yarn/staging/tucu/.staging/job_1374710243541_0001/job.jar,
1374710294773, PATTERN, (?:classes/|lib/).* } not present in cache.
2013-07-24 16:58:19,694 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting
absolute path :
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/container_1374710243541_0001_02_000001
2013-07-24 16:58:19,694 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Unknown localizer with localizerId container_1374710243541_0001_02_000001 is
sending heartbeat. Ordering it to DIE
2013-07-24 16:58:19,694 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete
returned false for path:
[/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/container_1374710243541_0001_02_000001]
2013-07-24 16:58:19,694 WARN
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=tucu
OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE
DESCRIPTION=Container failed with state: LOCALIZATION_FAILED
APPID=application_1374710243541_0001
CONTAINERID=container_1374710243541_0001_02_000001
2013-07-24 16:58:19,694 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1374710243541_0001_02_000001 transitioned from
LOCALIZATION_FAILED to DONE
2013-07-24 16:58:19,695 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Removing container_1374710243541_0001_02_000001 from application
application_1374710243541_0001
2013-07-24 16:58:19,695 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
ResourceCalculatorPlugin is unavailable on this system.
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
is disabled.
2013-07-24 16:58:19,695 WARN org.apache.hadoop.ipc.Client: interrupted waiting
to send rpc request to server
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048)
at org.apache.hadoop.ipc.Client.call(Client.java:1401)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy25.heartbeat(Unknown Source)
at
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:250)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:980)
2013-07-24 16:58:20,050 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out
status for container: container_id {, app_attempt_id {, application_id {, id:
1, cluster_timestamp: 1374710243541, }, attemptId: 2, }, id: 1, }, state:
C_COMPLETE, diagnostics: "rename destination
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001/filecache/12
already exists.\n", exit_status: -1000,
2013-07-24 16:58:20,050 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
completed container container_1374710243541_0001_02_000001
2013-07-24 16:58:21,057 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1374710243541_0001 transitioned from RUNNING to
APPLICATION_RESOURCES_CLEANINGUP
2013-07-24 16:58:21,058 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting
absolute path :
/tmp/hadoop-tucu/nm-local-dir/usercache/tucu/appcache/application_1374710243541_0001
2013-07-24 16:58:21,058 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event APPLICATION_STOP for appId application_1374710243541_0001
2013-07-24 16:58:21,061 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1374710243541_0001 transitioned from
APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2013-07-24 16:58:21,061 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
Scheduling Log Deletion for application: application_1374710243541_0001, with
delay of 10800 seconds
{code}
Wonder if this is related to the fallout due to token changes.
> TestMRCredentials and TestBinaryTokenFile are failing on trunk
> ---------------------------------------------------------------
>
> Key: YARN-960
> URL: https://issues.apache.org/jira/browse/YARN-960
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.1.0-beta
> Reporter: Alejandro Abdelnur
> Assignee: Daryn Sharp
> Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-960.patch
>
>
> Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945.
> Making it a blocker until full impact of the issue is scoped.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira