[
https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602895#comment-13602895
]
Eli Reisman commented on YARN-477:
----------------------------------
nodemanager log for MiniYARNCluster DID get a log report for app master that
could only come from the shell command failing:
{code}
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/giraph/yarn/GiraphApplicationMaster
Caused by: java.lang.ClassNotFoundException:
org.apache.giraph.yarn.GiraphApplicationMaster
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
{code}
So thats good. But I don't think its propagating this to the MiniYARNCluster's
RM or my Client. From my Client's end, the logs are endless heartbeat msg's
with -1000 exitCode until I ctrl-c out of the test suite.
FYI, this is not a priority or blocker for my Giraph on YARN, it all works now
(including the test) in case I wasn't clear. But it should probably get
investigated/fixed soon if I've really found something here ;)
> When default container executor fails right away, at the CLI launching our
> App Master, Client doesn't always get the signal to kill the job
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-477
> URL: https://issues.apache.org/jira/browse/YARN-477
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Eli Reisman
> Assignee: Zhijie Shen
>
> I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch
> my App Master, if the container command line runs it successfully, any
> failure in the App Master or my launched Giraph Tasks promptly reports to
> Client and ends my job run. However, if the command line sent to the app
> master container fails to launch it at all, the error exit code is not
> propagating. My client hangs with the job at containersUsed == 1 and state ==
> ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way
> out.
> Disclaimer: this could be my fault. But I wanted to throw it out there in
> case its not. I also (when this happens) not getting error logs since the app
> master never launched, so I really have no visibility into why it failed to
> launch. I am sure its not launching, but the client IS sending the app
> request, getting a container for my AM, and I see the command line run on the
> container in my logs. Thats all.
> Thanks! If this is a dup or "won't fix" for some reason, let me know and
> sorry for wasting your time!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira