[
https://issues.apache.org/jira/browse/YARN-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568914#comment-13568914
]
Eli Reisman commented on YARN-367:
----------------------------------
This is great, I was actually about to do the same thing. there are a number of
settings in the confs that I am sure after doing 3 installs in a row according
to the instructions online of Hadoop 2.0.x and having wierd conf things that
result in errors just like this. The AM Container failure never reports a good
exception message, and sometimes even the nodemanager logs of the job attempt
are not very specific due to the nature of the missing config, or ...?
Point being, its almost always a config setting, and the framework almost never
provides a clue other than job start and immediate failure (outfile not found
as in Zhijie's example here, or more cryptic) as to what really made it choke.
> Exception when yarn.nodemanager.local-dirs is not explicitly set
> ----------------------------------------------------------------
>
> Key: YARN-367
> URL: https://issues.apache.org/jira/browse/YARN-367
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Zhijie Shen
> Assignee: Zhijie Shen
>
> If yarn.nodemanager.local-dirs is not explicitly set, and if the default
> local-dirs are not the children of hadoop.tmp.dir, the exception will occur
> when the wordcount example is run. Bellow is log info.
> ==========
> 2013-01-30 22:16:04,229 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Start request for container_1359612879014_0001_01_000001 by user zshen
> 2013-01-30 22:16:04,247 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Creating a new application reference for app application_1359612879014_0001
> 2013-01-30 22:16:04,250 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=zshen
> IP=127.0.0.1 OPERATION=Start Container Request
> TARGET=ContainerManageImpl RESULT=SUCCESS
> APPID=application_1359612879014_0001
> CONTAINERID=container_1359612879014_0001_01_000001
> 2013-01-30 22:16:04,252 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1359612879014_0001 transitioned from NEW to INITING
> 2013-01-30 22:16:04,252 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Adding container_1359612879014_0001_01_000001 to application
> application_1359612879014_0001
> 2013-01-30 22:16:04,257 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1359612879014_0001 transitioned from INITING to
> RUNNING
> 2013-01-30 22:16:04,262 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1359612879014_0001_01_000001 transitioned from NEW to
> LOCALIZING
> 2013-01-30 22:16:04,268 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens
> transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar
> transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo
> transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split
> transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,269 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml
> transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,269 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1359612879014_0001_01_000001
> 2013-01-30 22:16:04,401 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file
> /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens.
> Credentials list:
> 2013-01-30 22:16:04,423 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Initializing user zshen
> 2013-01-30 22:16:04,569 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
> from
> /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens
> to
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001.tokens
> 2013-01-30 22:16:04,570 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set
> to
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
> =
> file:/tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
> 2013-01-30 22:16:04,955 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out
> status for container: container_id {, app_attempt_id {, application_id {, id:
> 1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state:
> C_RUNNING, diagnostics: "", exit_status: -1000,
> 2013-01-30 22:16:05,117 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens
> transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,312 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar
> transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,465 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo
> transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,608 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split
> transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,751 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml
> transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,752 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1359612879014_0001_01_000001 transitioned from
> LOCALIZING to LOCALIZED
> 2013-01-30 22:16:05,866 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1359612879014_0001_01_000001 transitioned from LOCALIZED
> to RUNNING
> 2013-01-30 22:16:05,866 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> ResourceCalculatorPlugin is unavailable on this system.
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> is disabled.
> 2013-01-30 22:16:05,910 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Failed to launch container.
> java.io.FileNotFoundException: File
> /Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001
> does not exist
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:498)
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:996)
> at
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
> at
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:330)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:135)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> 2013-01-30 22:16:05,913 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1359612879014_0001_01_000001 transitioned from RUNNING
> to EXITED_WITH_FAILURE
> 2013-01-30 22:16:05,914 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Cleaning up container container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,934 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting
> absolute path :
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,934 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=zshen
> OPERATION=Container Finished - Failed TARGET=ContainerImpl
> RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE
> APPID=application_1359612879014_0001
> CONTAINERID=container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,937 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1359612879014_0001_01_000001 transitioned from
> EXITED_WITH_FAILURE to DONE
> 2013-01-30 22:16:05,937 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Removing container_1359612879014_0001_01_000001 from application
> application_1359612879014_0001
> 2013-01-30 22:16:05,937 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> ResourceCalculatorPlugin is unavailable on this system.
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> is disabled.
> 2013-01-30 22:16:05,958 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out
> status for container: container_id {, app_attempt_id {, application_id {, id:
> 1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state:
> C_COMPLETE, diagnostics: "", exit_status: -1,
> 2013-01-30 22:16:05,959 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
> completed container container_1359612879014_0001_01_000001
> 2013-01-30 22:16:06,965 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1359612879014_0001 transitioned from RUNNING to
> APPLICATION_RESOURCES_CLEANINGUP
> 2013-01-30 22:16:06,965 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting
> absolute path :
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
> 2013-01-30 22:16:06,966 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event APPLICATION_STOP for appId application_1359612879014_0001
> 2013-01-30 22:16:06,970 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1359612879014_0001 transitioned from
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2013-01-30 22:16:06,970 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
> Scheduling Log Deletion for application: application_1359612879014_0001,
> with delay of 10800 seconds
> ==========
> Below is the setting in hdfs-site.xml.
> ==========
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data</value>
> </property>
> ==========
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira