[ 
https://issues.apache.org/jira/browse/YARN-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116843#comment-14116843
 ] 

Karthik Kambatla commented on YARN-2476:
----------------------------------------

As Tsuyoshi said, this is the expected behavior today. The apps are 
re-submitted after restart/failover. Because we are re-submitting 
programmatically, the jobs all come in roughly at the same time, and any of 
them could be scheduled. If you want strictly FIFO behavior, you can always set 
the policy for that queue to be FIFO. Once work-preserving RM restart is done, 
we wouldn't kill the running AM and resubmit the app and the order before the 
restart/failover is preserved. 

Orthogonal - FileSystemRMStateStore is good for RM restart, but can have 
fencing issues with RM HA. We recommend using ZKRMStateStore for RM HA. 

I propose closing this as a duplicate (is part of) of YARN-556.

> Apps are scheduled in random order after RM failover
> ----------------------------------------------------
>
>                 Key: YARN-2476
>                 URL: https://issues.apache.org/jira/browse/YARN-2476
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.1
>         Environment: Linux
>            Reporter: Santosh Marella
>              Labels: ha, high-availability, resourcemanager
>
> RM HA is configured with 2 RMs. Used FileSystemRMStateStore.
> Fairscheduler allocation file is configured in yarn-site.xml:
> <property>
>   <name>yarn.scheduler.fair.allocation.file</name>
>   <value>/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/allocation-pools.xml</value>
> </property>
> FS allocation-pools.xml:
> <?xml version="1.0"?>
> <allocations>
>    <queue name="dev">
>       <minResources>10000 mb,10vcores</minResources>
>           <maxResources>19000 mb,100vcores</maxResources>
>           <maxRunningApps>5525</maxRunningApps>
>           <weight>4.5</weight>
>           <schedulingPolicy>fair</schedulingPolicy>
>           <fairSharePreemptionTimeout>3600</fairSharePreemptionTimeout>
>    </queue>
>    <queue name="default">
>       <minResources>10000 mb,10vcores</minResources>
>           <maxResources>19000 mb,100vcores</maxResources>
>           <maxRunningApps>5525</maxRunningApps>
>           <weight>1.5</weight>
>           <schedulingPolicy>fair</schedulingPolicy>
>           <fairSharePreemptionTimeout>3600</fairSharePreemptionTimeout>
>    </queue>
>     <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
>     <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
> </allocations>
>     Submitted 10 sleep jobs to a FS queue using the command:
>     hadoop jar hadoop-mapreduce-examples-2.4.1-mapr-4.0.1-SNAPSHOT.jar sleep
>     -Dmapreduce.job.queuename=root.dev  -m 10 -r 10 -mt 10000 -rt 10000
>     All the jobs were submitted by the same user, with the same priority and 
> to the
>     same queue. No other jobs were running in the cluster. Jobs started 
> executing
>     in the order in which they were submitted (jobs 6 to 10 were active, 
> while 11
>     to 15 were waiting):
>     root@perfnode131:/opt/mapr/hadoop/hadoop-2.4.1/logs# yarn application 
> -list
>     Total number of applications (application-types: [] and states: 
> [SUBMITTED,ACCEPTED, RUNNING]):10
>     Application-Id      Application-Name        Application-Type User         
>   Queue                   State             Final-State Progress              
>           Tracking-URL
>     application_1408572781346_0012             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0% N/A
>     application_1408572781346_0014             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0% N/A
>     application_1408572781346_0011             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0% N/A
>     application_1408572781346_0010             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5% http://perfnode132:52799
>     application_1408572781346_0008             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5% http://perfnode131:33766
>     application_1408572781346_0009             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5% http://perfnode132:50964
>     application_1408572781346_0007             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5% http://perfnode134:52966
>     application_1408572781346_0015             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0% N/A
>     application_1408572781346_0006             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 9.5% http://perfnode134:34094
>     application_1408572781346_0013             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0%  N/A
>     Stopped RM1. There was a failover and RM2 became active. But the jobs 
> seem to
>     have started in a different order:
>     root@perfnode131:~/scratch/raw_rm_logs_fs_hang# yarn application -list
>     14/08/21 07:26:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
> over to rm2
>     Total number of applications (application-types: [] and states: 
> [SUBMITTED,ACCEPTED, RUNNING]):10
>     Application-Id      Application-Name        Application-Type User         
>   Queue                   State             Final-State Progress              
>           Tracking-URL
>     application_1408572781346_0012             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5%http://perfnode134:59351
>     application_1408572781346_0014             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5%http://perfnode132:37866
>     application_1408572781346_0011             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5%http://perfnode131:59744
>     application_1408572781346_0010             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0%N/A
>     application_1408572781346_0008             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0%N/A
>     application_1408572781346_0009             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0%N/A
>     application_1408572781346_0007             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0%N/A
>     application_1408572781346_0015             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5%http://perfnode134:39754
>     application_1408572781346_0006             Sleep job               
> MAPREDUCE userA        root.dev                ACCEPTED               
> UNDEFINED 0%N/A
>     application_1408572781346_0013             Sleep job               
> MAPREDUCE userA        root.dev                 RUNNING               
> UNDEFINED 5%http://perfnode132:34714
> The problem is this:
> - The jobs that were previously in RUNNING state moved to ACCEPTED after 
> failover.
> - The jobs that were previously in ACCEPTED state moved to RUNNING after 
> failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to