Eric O. LEBIGOT (EOL) created YARN-3570:
-------------------------------------------

             Summary: Non-zero exit status of master application not propagated
                 Key: YARN-3570
                 URL: https://issues.apache.org/jira/browse/YARN-3570
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.4.0
         Environment: PySpark on AWS EMR
            Reporter: Eric O. LEBIGOT (EOL)


The master of my application fails, but the "Final app status" is 0. This 
causes all sorts of problems (EMR not detecting a problem, my data pipeline 
continuing, etc.).

Here is what happens. The master fails (showing only relevant lines from 
daemons/i-…/yarn-hadoop-nodemanager-ip-….log.gz):
{quote}
2015-05-02 03:32:11,000 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor 
(ContainersLauncher #0): Exit code from container 
container_1430537363277_0001_01_000001 is : 1
2015-05-02 03:32:11,001 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor 
(ContainersLauncher #0): Exception from container-launch with container ID: 
container_1430537363277_0001_01_000001 and exit code: 1
2015-05-02 03:32:11,003 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch
 (ContainersLauncher #0): Container exited with a non-zero exit code 1
2015-05-02 03:32:11,004 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container 
(AsyncDispatcher event handler): Container 
container_1430537363277_0001_01_000001 transitioned from RUNNING to 
EXITED_WITH_FAILURE
2015-05-02 03:32:11,032 WARN 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger (AsyncDispatcher event 
handler): USER=hadoop       OPERATION=Container Finished - Failed   
TARGET=ContainerImpl    RESULT=FAILURE  DESCRIPTION=Container failed with 
state: EXITED_WITH_FAILURE    APPID=application_1430537363277_0001   
CONTAINERID=container_1430537363277_0001_01_000001
2015-05-02 03:32:11,032 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container 
(AsyncDispatcher event handler): Container 
container_1430537363277_0001_01_000001 transitioned from EXITED_WITH_FAILURE to 
DONE
{quote}
and, from ./daemons/i-…/yarn-hadoop-resourcemanager-ip-….log.gz
{quote}
2015-05-02 03:32:10,493 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl 
(AsyncDispatcher event handler): Updating application attempt 
appattempt_1430537363277_0001_000001 with final state: FINISHING, and exit 
status: -1000
{quote}

Now, the whole application nonetheless strangely returns a 0 exit code, in 
./task-attempts/application_1430537363277_0001/container_1430537363277_0001_01_000001/stderr.gz
:
{quote}
15/05/02 03:32:10 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, 
exitCode: 0, (reason: Shutdown hook called before final status was reported.)
{quote}

The reason for this "error hiding" is maybe given by this last reason (early 
shutdown hook). Now, is this a possible YARN bug? or is it more likely that 
something is happening with the AWS EMR cluster manager that I am using (maybe 
it detects a task failure before YARN and shuts down the PySpark application 
that was running on YARN?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to