Yesha Vora created YARN-7065:
--------------------------------

             Summary: [RM UI] App status not getting updated in "All 
application" page
                 Key: YARN-7065
                 URL: https://issues.apache.org/jira/browse/YARN-7065
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Yesha Vora


Scenario:
1) Run Spark Long Running application
2) Do RM and NN failover randomly
3) Validate App state in Yarn

The Spark applications are finished. Yarn-cli returns correct status of yarn 
application.
{code}
[hrt_qa@xxx hadoopqe]$ yarn application -status application_1503203977699_0014
17/08/21 16:56:10 INFO client.AHSProxy: Connecting to Application History 
server at host1 xxx.xx.xx.x:10200
17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Looking 
for the active RM in [rm1, rm2]...
17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Found 
active RM [rm1]
Application Report : 
        Application-Id : application_1503203977699_0014
        Application-Name : 
org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources
        Application-Type : SPARK
        User : hrt_qa
        Queue : default
        Application Priority : null
        Start-Time : 1503215983532
        Finish-Time : 1503250203806
        Progress : 0%
        State : FAILED
        Final-State : FAILED
        Tracking-URL : 
https://host1:8090/cluster/app/application_1503203977699_0014
        RPC Port : -1
        AM Host : N/A
        Aggregate Resource Allocation : 174722793 MB-seconds, 170603 
vcore-seconds
        Log Aggregation Status : SUCCEEDED
        Diagnostics : Application application_1503203977699_0014 failed 20 
times due to AM Container for appattempt_1503203977699_0014_000020 exited with  
exitCode: 1
For more detailed output, check the application tracking page: 
https://host1:8090/cluster/app/application_1503203977699_0014 Then click on 
links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e04_1503203977699_0014_20_000001
Exit code: 1
Stack trace: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Launch container failed
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Shell output: main : command provided 1
main : run as user is hrt_qa
main : requested yarn user is hrt_qa
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/grid/0/hadoop/yarn/local/nmPrivate/application_1503203977699_0014/container_e04_1503203977699_0014_20_000001/container_e04_1503203977699_0014_20_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
        Unmanaged Application : false
        Application Node Label Expression : <Not set>
        AM container Node Label Expression : <DEFAULT_PARTITION>{code}

However, RM UI "All application" page still shows the application in "RUNNING" 
State.  
https://host1:8090/cluster
On clicking application_id ( 
https://host1:8090/cluster/app/application_1503203977699_0014) , it redirects 
to application page and there it shows correct application state = Failed. 

The App status is not getting updated on Yarn All Application page. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to