[jira] [Commented] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job

Eli Reisman (JIRA) Thu, 14 Mar 2013 16:48:13 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602895#comment-13602895
 ]


Eli Reisman commented on YARN-477:
----------------------------------

nodemanager log for MiniYARNCluster DID get a log report for app master that 
could only come from the shell command failing:

{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/giraph/yarn/GiraphApplicationMaster
Caused by: java.lang.ClassNotFoundException: 
org.apache.giraph.yarn.GiraphApplicationMaster
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
{code}

So thats good. But I don't think its propagating this to the MiniYARNCluster's 
RM or my Client. From my Client's end, the logs are endless heartbeat msg's 
with -1000 exitCode until I ctrl-c out of the test suite.

FYI, this is not a priority or blocker for my Giraph on YARN, it all works now 
(including the test) in case I wasn't clear. But it should probably get 
investigated/fixed soon if I've really found something here ;)


                
> When default container executor fails right away, at the CLI launching our 
> App Master, Client doesn't always get the signal to kill the job
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-477
>                 URL: https://issues.apache.org/jira/browse/YARN-477
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Eli Reisman
>            Assignee: Zhijie Shen
>
> I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch 
> my App Master, if the container command line runs it successfully, any 
> failure in the App Master or my launched Giraph Tasks promptly reports to 
> Client and ends my job run. However, if the command line sent to the app 
> master container fails to launch it at all, the error exit code is not 
> propagating. My client hangs with the job at containersUsed == 1 and state == 
> ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way 
> out.
> Disclaimer: this could be my fault. But I wanted to throw it out there in 
> case its not. I also (when this happens) not getting error logs since the app 
> master never launched, so I really have no visibility into why it failed to 
> launch. I am sure its not launching, but the client IS sending the app 
> request, getting a container for my AM, and I see the command line run on the 
> container in my logs. Thats all.
> Thanks! If this is a dup or "won't fix" for some reason, let me know and 
> sorry for wasting your time!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-477) When default container executor fails right away, at the CLI launching our App Master, Client doesn't always get the signal to kill the job

Reply via email to