[ 
https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602895#comment-13602895
 ] 

Eli Reisman commented on YARN-477:
----------------------------------

nodemanager log for MiniYARNCluster DID get a log report for app master that 
could only come from the shell command failing:

{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/giraph/yarn/GiraphApplicationMaster
Caused by: java.lang.ClassNotFoundException: 
org.apache.giraph.yarn.GiraphApplicationMaster
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
{code}

So thats good. But I don't think its propagating this to the MiniYARNCluster's 
RM or my Client. From my Client's end, the logs are endless heartbeat msg's 
with -1000 exitCode until I ctrl-c out of the test suite.

FYI, this is not a priority or blocker for my Giraph on YARN, it all works now 
(including the test) in case I wasn't clear. But it should probably get 
investigated/fixed soon if I've really found something here ;)


                
> When default container executor fails right away, at the CLI launching our 
> App Master, Client doesn't always get the signal to kill the job
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-477
>                 URL: https://issues.apache.org/jira/browse/YARN-477
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eli Reisman
>            Assignee: Zhijie Shen
>
> I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch 
> my App Master, if the container command line runs it successfully, any 
> failure in the App Master or my launched Giraph Tasks promptly reports to 
> Client and ends my job run. However, if the command line sent to the app 
> master container fails to launch it at all, the error exit code is not 
> propagating. My client hangs with the job at containersUsed == 1 and state == 
> ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way 
> out.
> Disclaimer: this could be my fault. But I wanted to throw it out there in 
> case its not. I also (when this happens) not getting error logs since the app 
> master never launched, so I really have no visibility into why it failed to 
> launch. I am sure its not launching, but the client IS sending the app 
> request, getting a container for my AM, and I see the command line run on the 
> container in my logs. Thats all.
> Thanks! If this is a dup or "won't fix" for some reason, let me know and 
> sorry for wasting your time!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to