[jira] [Commented] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725584#comment-16725584
 ] 

Yuqi Wang commented on YARN-9151:
-

Thanks [~elgoiri], let me try to fix test, style and add UT for 
UnknownHostException.

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch
>
>
> {color:#205081}*Issue Summary:*{color}
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> {color:#205081}*Issue Repro Steps:*{color}
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS.
>  (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> {color:#205081}*Issue Logs:*{color}
> See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>  
> {noformat}
> The standby RM failed to rejoin the election, but it will never retry or 
> crash later, *so afterwards no zk related logs and the standby RM is forever 
> hang, even if the zk connect string hostnames are changed back the orignal 
> ones in DNS.*
>  So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> {color:#205081}*Caused By:*{color}
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> {color:#205081}*What the Patch's solution:*{color}
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
> And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types (until we triaged it to be in whitelist), should 
> crash RM, because we *cannot ensure* that they will *never* cause RM cannot 
> work in standby state, and the *conservative* way is to crash RM. 
>  Besides, after crash, the RM's external watchdog service can know this and 
> try to repair the RM machine, send alerts, etc. 
>  And the RM can reload the latest 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
 
{noformat}
The standby RM failed to rejoin the election, but it will never retry or crash 
later, *so afterwards no zk related logs and the standby RM is forever hang, 
even if the zk connect string hostnames are changed back the orignal ones in 
DNS.*
 So, this should be a bug in RM, because *RM should always try to join 
election* (give up join election should only happen on RM decide to crash), 
otherwise, a RM without inside the election can never become active again and 
start real works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author said:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types (until we triaged it to be in whitelist), should crash RM, 
because we *cannot ensure* that they will *never* cause RM cannot work in 
standby state, and the *conservative* way is to crash RM. 
 Besides, after crash, the RM's external watchdog service can know this and try 
to repair the RM machine, send alerts, etc. 
 And the RM can reload the latest zk connect string config with the latest 
hostnames.

For more details, please check the patch.

  was:
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop 

[jira] [Comment Edited] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724903#comment-16724903
 ] 

Yuqi Wang edited comment on YARN-9151 at 12/19/18 11:48 AM:


BTW, [~jianhe], for YARN-4438, you said:
{quote}_If it is due to close(), don't we want to force give-up so the other RM 
becomes active? If it is on initAndStartLeaderLatch(), *this RM will never 
become active; don't we want to just die?*_

What do you mean by force give-up ? exit RM ?
 The underlying curator implementation *will retry the connection in 
background*, even though the exception is thrown. See *Guaranteeable* interface 
in Curator. I think exit RM is too harsh here. Even though RM remains at 
standby, all services should be already shutdown, so there's no harm to the end 
users ?
{quote}
However, for this case, if we are using CuratorBasedElectorService, I think 
curator will *NOT* retry the connection, because I saw below things in the log 
and checked curator's code:

*Background exception was not retry-able or retry gave up for 
UnknownHostException*
{code:java}
2018-12-14 14:14:20,847 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception 
was not retry-able or retry gave up
java.net.UnknownHostException: BN2AAP10C07C229
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at 
org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:461)
at 
org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:146)
at org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
at 
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:193)
at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

Besides, in YARN-4438, I did not see you used the Curator *Guaranteeable* 
interface.

Could you please confirm above things?

So, in the patch, if rejoin election throws exception, it will send 
EMBEDDED_ELECTOR_FAILED, and then RM will crash.


was (Author: yqwang):
BTW, [~jianhe], for YARN-4438, you said:
{quote}_If it is due to close(), don't we want to force give-up so the other RM 
becomes active? If it is on initAndStartLeaderLatch(), *this RM will never 
become active; don't we want to just die?*_

What do you mean by force give-up ? exit RM ?
 The underlying curator implementation *will retry the connection in 
background*, even though the exception is thrown. See *Guaranteeable* interface 
in Curator. I think exit RM is too harsh here. Even though RM remains at 
standby, all services should be already shutdown, so there's no harm to the end 
users ?
{quote}
However, for this case, if we are using CuratorBasedElectorService, I think 
curator will *NOT* retry the connection, because I saw below things in the log 
and checked curator's code:

*Background exception was not retry-able or retry gave up for 
UnknownHostException*
{code:java}
2018-12-14 14:14:20,847 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception 
was not retry-able or retry gave up
java.net.UnknownHostException: BN2AAP10C07C229
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to rejoin the election, but it will never retry or crash 
later, so afterwards no zk related logs and the standby RM is forever hang, 
even if the zk connect string hostnames are changed back the orignal ones in 
DNS.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author 
[said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types (until we triaged it to be in whitelist), should crash RM, 
because we *cannot ensure* that they will *never* cause RM cannot work in 
standby state, and the *conservative* way is to crash RM. 
Besides, after crash, the RM's external watchdog service can know this and try 
to repair the RM machine, send alerts, etc. 
And the RM can reload the latest zk connect string config with the latest 
hostnames.

For more details, please check the patch.

  was:
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to rejoin the election, but it will never retry or crash 
later, so afterwards no zk related logs and the standby RM is forever hang, 
even if the zk connect string hostnames are changed back the orignal ones in 
DNS.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author 
[said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types (until we triaged it to be in whitelist), should crash RM, 
because we *cannot ensure* that they will *never* cause RM cannot work in 
standby state, and the *conservative* way is to crash RM. Besides, after crash, 
the RM's external watchdog service can know this and try to repair the RM 
machine, send alerts, etc. And the RM can reload the latest zk connect string 
config with the latest hostnames.

For more details, please check the patch.

  was:
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Attachment: (was: YARN-9151.001.patch)

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> {color:#205081}*Issue Summary:*{color}
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> {color:#205081}*Issue Repro Steps:*{color}
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS.
>  (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> {color:#205081}*Issue Logs:*{color}
> See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to rejoin the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang, even if the zk connect string hostnames are changed back the orignal 
> ones in DNS.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> {color:#205081}*Caused By:*{color}
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author 
> [said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> {color:#205081}*What the Patch's solution:*{color}
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
> And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will *never* cause RM cannot work in standby state, and the 
> *conservative* way is to crash RM. Besides, after crash, the RM's external 
> watchdog service can know this and try to repair the RM machine, send 

[jira] [Commented] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724913#comment-16724913
 ] 

Yuqi Wang commented on YARN-9151:
-

[~elgoiri], could you please also check it.

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> {color:#205081}*Issue Summary:*{color}
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> {color:#205081}*Issue Repro Steps:*{color}
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS.
>  (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> {color:#205081}*Issue Logs:*{color}
> See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to rejoin the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang, even if the zk connect string hostnames are changed back the orignal 
> ones in DNS.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> {color:#205081}*Caused By:*{color}
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author 
> [said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> {color:#205081}*What the Patch's solution:*{color}
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
> And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will *never* cause RM cannot work in standby state, and the 
> *conservative* way is to crash RM. Besides, after crash, the RM's external 
> watchdog service can know this 

[jira] [Comment Edited] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724913#comment-16724913
 ] 

Yuqi Wang edited comment on YARN-9151 at 12/19/18 11:37 AM:


[~elgoiri], could you please also check it. :)


was (Author: yqwang):
[~elgoiri], could you please also check it.

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> {color:#205081}*Issue Summary:*{color}
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> {color:#205081}*Issue Repro Steps:*{color}
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS.
>  (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> {color:#205081}*Issue Logs:*{color}
> See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to rejoin the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang, even if the zk connect string hostnames are changed back the orignal 
> ones in DNS.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> {color:#205081}*Caused By:*{color}
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author 
> [said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> {color:#205081}*What the Patch's solution:*{color}
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
> And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will *never* cause RM cannot work in standby 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to rejoin the election, but it will never retry or crash 
later, so afterwards no zk related logs and the standby RM is forever hang, 
even if the zk connect string hostnames are changed back the orignal ones in 
DNS.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author 
[said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types, should crash RM, because we *cannot ensure* that they will 
*never* cause RM cannot work in standby state, and the *conservative* way is to 
crash RM. Besides, after crash, the RM's external watchdog service can know 
this and try to repair the RM machine, send alerts, etc. And the RM can reload 
the latest zk connect string config with the latest hostnames.

For more details, please check the patch.

  was:
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start 

[jira] [Comment Edited] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724903#comment-16724903
 ] 

Yuqi Wang edited comment on YARN-9151 at 12/19/18 11:28 AM:


BTW, [~jianhe], for YARN-4438, you said:
{quote}_If it is due to close(), don't we want to force give-up so the other RM 
becomes active? If it is on initAndStartLeaderLatch(), *this RM will never 
become active; don't we want to just die?*_

What do you mean by force give-up ? exit RM ?
 The underlying curator implementation *will retry the connection in 
background*, even though the exception is thrown. See *Guaranteeable* interface 
in Curator. I think exit RM is too harsh here. Even though RM remains at 
standby, all services should be already shutdown, so there's no harm to the end 
users ?
{quote}
However, for this case, if we are using CuratorBasedElectorService, I think 
curator will *NOT* retry the connection, because I saw below things in the log 
and checked curator's code:

*Background exception was not retry-able or retry gave up for 
UnknownHostException*
{code:java}
2018-12-14 14:14:20,847 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception 
was not retry-able or retry gave up
java.net.UnknownHostException: BN2AAP10C07C229
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at 
org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:461)
at 
org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:146)
at org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
at 
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:193)
at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Besides, in YARN-4438, I did not see you used the Curator *Guaranteeable* 
interface.

So, in the patch, if rejoin election throws exception, it will send 
EMBEDDED_ELECTOR_FAILED, and then RM will crash and reload the latest zk 
connect string config.


was (Author: yqwang):
BTW, [~jianhe], for YARN-4438, you said:
{quote}_If it is due to close(), don't we want to force give-up so the other RM 
becomes active? If it is on initAndStartLeaderLatch(), *this RM will never 
become active; don't we want to just die?*_

What do you mean by force give-up ? exit RM ?
 The underlying curator implementation *will retry the connection in 
background*, even though the exception is thrown. See *Guaranteeable* interface 
in Curator. I think exit RM is too harsh here. Even though RM remains at 
standby, all services should be already shutdown, so there's no harm to the end 
users ?
{quote}
However, for this case, if we are using CuratorBasedElectorService, I think 
curator will *NOT* retry the connection, because I saw below things in the log 
and checked curator's code:

*Background exception was not retry-able or retry gave up for 
UnknownHostException*
{code:java}
2018-12-14 14:14:20,847 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception 
was not retry-able or retry gave up
java.net.UnknownHostException: BN2AAP10C07C229
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
 

[jira] [Comment Edited] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724903#comment-16724903
 ] 

Yuqi Wang edited comment on YARN-9151 at 12/19/18 11:28 AM:


BTW, [~jianhe], for YARN-4438, you said:
{quote}_If it is due to close(), don't we want to force give-up so the other RM 
becomes active? If it is on initAndStartLeaderLatch(), *this RM will never 
become active; don't we want to just die?*_

What do you mean by force give-up ? exit RM ?
 The underlying curator implementation *will retry the connection in 
background*, even though the exception is thrown. See *Guaranteeable* interface 
in Curator. I think exit RM is too harsh here. Even though RM remains at 
standby, all services should be already shutdown, so there's no harm to the end 
users ?
{quote}
However, for this case, if we are using CuratorBasedElectorService, I think 
curator will *NOT* retry the connection, because I saw below things in the log 
and checked curator's code:

*Background exception was not retry-able or retry gave up for 
UnknownHostException*
{code:java}
2018-12-14 14:14:20,847 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception 
was not retry-able or retry gave up
java.net.UnknownHostException: BN2AAP10C07C229
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at 
org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:461)
at 
org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:146)
at org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
at 
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:193)
at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Besides, in YARN-4438, I did not see you used the Curator *Guaranteeable* 
interface.

So, in the patch, if rejoin election throws exception, it will send 
EMBEDDED_ELECTOR_FAILED, and then RM will crash.


was (Author: yqwang):
BTW, [~jianhe], for YARN-4438, you said:
{quote}_If it is due to close(), don't we want to force give-up so the other RM 
becomes active? If it is on initAndStartLeaderLatch(), *this RM will never 
become active; don't we want to just die?*_

What do you mean by force give-up ? exit RM ?
 The underlying curator implementation *will retry the connection in 
background*, even though the exception is thrown. See *Guaranteeable* interface 
in Curator. I think exit RM is too harsh here. Even though RM remains at 
standby, all services should be already shutdown, so there's no harm to the end 
users ?
{quote}
However, for this case, if we are using CuratorBasedElectorService, I think 
curator will *NOT* retry the connection, because I saw below things in the log 
and checked curator's code:

*Background exception was not retry-able or retry gave up for 
UnknownHostException*
{code:java}
2018-12-14 14:14:20,847 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception 
was not retry-able or retry gave up
java.net.UnknownHostException: BN2AAP10C07C229
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at 

[jira] [Commented] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724903#comment-16724903
 ] 

Yuqi Wang commented on YARN-9151:
-

BTW, [~jianhe], for YARN-4438, you said:
{quote}_If it is due to close(), don't we want to force give-up so the other RM 
becomes active? If it is on initAndStartLeaderLatch(), *this RM will never 
become active; don't we want to just die?*_

What do you mean by force give-up ? exit RM ?
 The underlying curator implementation *will retry the connection in 
background*, even though the exception is thrown. See *Guaranteeable* interface 
in Curator. I think exit RM is too harsh here. Even though RM remains at 
standby, all services should be already shutdown, so there's no harm to the end 
users ?
{quote}
However, for this case, if we are using CuratorBasedElectorService, I think 
curator will *NOT* retry the connection, because I saw below things in the log 
and checked curator's code:

*Background exception was not retry-able or retry gave up for 
UnknownHostException*
{code:java}
2018-12-14 14:14:20,847 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Background exception 
was not retry-able or retry gave up
java.net.UnknownHostException: BN2AAP10C07C229
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at 
org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:461)
at 
org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:146)
at org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
at 
org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:193)
at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Besides, in YARN-4438, I did not see you used the *Guaranteeable* interface in 
Curator.

So, in the patch, if rejoin election throws exception, it will send 
EMBEDDED_ELECTOR_FAILED, and then RM will crash and reload the latest zk 
connect string config.

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> {color:#205081}*Issue Summary:*{color}
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> {color:#205081}*Issue Repro Steps:*{color}
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS.
>  (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> {color:#205081}*Issue Logs:*{color}
> See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to rejoin the election, but it will never retry or crash 
later, so afterwards no zk related logs and the standby RM is forever hang, 
even if the zk connect string hostnames are changed back the orignal ones in 
DNS.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author 
[said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types, should crash RM, because we *cannot ensure* that they will 
*never* cause RM cannot work in standby state, and the *conservative* way is to 
crash RM. Besides, after crash, the RM's external watchdog service can know 
this and try to repair the RM machine, send alerts, etc.

For more details, please check the patch.

  was:
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS.
 (In reality, we need to replace old/bad zk machines to new/good zk machines, 
so their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException (Here the 
exception is eat and just send event)
  Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to rejoin the election, but it will never retry or crash 
later, so afterwards no zk related logs and the standby RM is forever hang, 
even if the zk connect string hostnames are changed back the orignal ones in 
DNS.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author said:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types, should crash RM, because we *cannot ensure* that they will 
*never* cause RM cannot work in standby state, and the *conservative* way is to 
crash RM. Besides, after crash, the RM's external watchdog service can know 
this and try to repair the RM machine, send alerts, etc.

For more details, please check the patch.

  was:
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  

[jira] [Commented] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724895#comment-16724895
 ] 

Yuqi Wang commented on YARN-9151:
-

[~kasha] and [~templedf], could you please look at this issue and fix.

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> {color:#205081}*Issue Summary:*{color}
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> {color:#205081}*Issue Repro Steps:*{color}
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS.
>  (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> {color:#205081}*Issue Logs:*{color}
> See the full RM log in attachment, yarn_rm.zip (The RM is BN4SCH101222318).
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException (Here the 
> exception is eat and just send event)
>   Send EMBEDDED_ELECTOR_FAILED RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to rejoin the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang, even if the zk connect string hostnames are changed back the orignal 
> ones in DNS.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> {color:#205081}*Caused By:*{color}
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> {color:#205081}*What the Patch's solution:*{color}
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
> And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will *never* cause RM cannot work in standby state, and the 
> *conservative* way is to crash RM. Besides, after crash, the RM's external 
> watchdog service can know this and try to repair the RM machine, send alerts, 
> etc.
> For more details, please check the patch.



--
This message was sent by 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to re-join the election, but it will never retry or 
crash later, so afterwards no zk related logs and the standby RM is forever 
hang.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author said:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types, should crash RM, because we *cannot ensure* that they will 
never cause RM cannot work in standby state, the *conservative* way is to crash 
RM. Besides, after crash, the RM watchdog can know this and try to repair the 
RM machine, send alerts, etc.

For more details, please check the patch.

  was:
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Fix Version/s: 2.9.2

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1, 2.9.2
>
> Attachments: YARN-9151-branch-2.9.2.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Release Note: Fix standby RM hangs (not retry or crash) forever due to 
forever lost from leader election. And now, RM will only transition to standby 
for known safe fatal events.  (was: Fix standby RM hangs (not retry or crash) 
forever due to forever lost from leader election. And now, RM will only 
transition to standby for known fatal events.)

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> {color:#205081}*Issue Summary:*{color}
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> {color:#205081}*Issue Repro Steps:*{color}
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> {color:#205081}*Issue Logs:*{color}
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> {color:#205081}*Caused By:*{color}
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> {color:#205081}*What the Patch's solution:*{color}
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
> And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
{color:#205081}*Issue Summary:*{color}
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

{color:#205081}*Issue Repro Steps:*{color}
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

{color:#205081}*Issue Logs:*{color}

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to re-join the election, but it will never retry or 
crash later, so afterwards no zk related logs and the standby RM is forever 
hang.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

{color:#205081}*Caused By:*{color}

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author said:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

{color:#205081}*What the Patch's solution:*{color}

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types, should crash RM, because we *cannot ensure* that they will 
never cause RM cannot work in standby state, the *conservative* way is to crash 
RM. Besides, after crash, the RM watchdog can know this and try to repair the 
RM machine, send alerts, etc.

For more details, please check the patch.

  was:
*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to 

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Fix Version/s: (was: 2.9.2)

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to re-join the election, but it will never retry or 
crash later, so afterwards no zk related logs and the standby RM is forever 
hang.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

*Caused By:*

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author said:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

*What the Patch's solution:*

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }

And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and future 
added failure types, should crash RM, because we *cannot ensure* that they will 
never cause RM cannot work in standby state, the *conservative* way is to crash 
RM. Besides, after crash, the RM watchdog can know this and try to repair the 
RM machine, send alerts, etc.

For more details, please check the patch.

  was:
*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  

[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Attachment: (was: YARN-9151-branch-2.9.2.001.patch)

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Attachment: YARN-9151.001.patch

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Fix Version/s: (was: 3.1.1)

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151-branch-2.9.2.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Attachment: YARN-9151-branch-2.9.2.001.patch

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151-branch-2.9.2.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Fix Version/s: (was: 2.9.2)
   3.1.1

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151-branch-2.9.2.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Fix Version/s: 2.9.2

> Standby RM hangs (not retry or crash) forever due to forever lost from leader 
> election
> --
>
> Key: YARN-9151
> URL: https://issues.apache.org/jira/browse/YARN-9151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
>  Labels: patch
> Fix For: 3.1.1
>
> Attachments: YARN-9151-branch-2.9.2.001.patch, yarn_rm.zip
>
>
> *Issue Summary:*
>  Standby RM hangs (not retry or crash) forever due to forever lost from 
> leader election
>  
> *Issue Repro Steps:*
>  # Start multiple RMs in HA mode
>  # Modify all hostnames in the zk connect string to different values in DNS. 
> (In reality, we need to replace old/bad zk machines to new/good zk machines, 
> so their DNS hostname will be changed.)
>  
> *Issue Logs:*
> The RM is BN4SCH101222318
> You can check the full RM log in attachment, yarn_rm.zip.
> To make it clear, the whole story is:
> {noformat}
> Join Election
> Win the leader (ZK Node Creation Callback)
>   Start to becomeActive 
> Start RMActiveServices 
> Start CommonNodeLabelsManager failed due to zk connect 
> UnknownHostException
> Stop CommonNodeLabelsManager
> Stop RMActiveServices
> Create and Init RMActiveServices
>   Fail to becomeActive 
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException 
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Already in standby state
>   ReJoin Election
>   Failed to Join Election due to zk connect UnknownHostException
>   (Here the exception is eat and just send transition to Standby event)
>   Send RMFatalEvent to transition RM to standby
> Transitioning RM to Standby
>   Start StandByTransitionThread
>   Found RMActiveServices's StandByTransitionRunnable object has already run 
> previously, so immediately return
>    
> (The standby RM failed to re-join the election, but it will never retry or 
> crash later, so afterwards no zk related logs and the standby RM is forever 
> hang.)
> {noformat}
> So, this should be a bug in RM, because *RM should always try to join 
> election* (give up join election should only happen on RM decide to crash), 
> otherwise, a RM without inside the election can never become active again and 
> start real works.
>  
> *Caused By:*
> It is introduced by YARN-3742
> The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
> happens, RM should transition to standby, instead of crash.
>  *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition 
> to standby, instead of crash.* (In contrast, before this change, RM makes all 
> to crash instead of to standby)
>  So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it 
> will leave the standby RM continue not work, such as stay in standby forever.
> And as the author said:
> {quote}I think a good approach here would be to change the RMFatalEvent 
> handler to transition to standby as the default reaction, *with shutdown as a 
> special case for certain types of failures.*
> {quote}
> But the author is *too optimistic when implement the patch.*
>  
> *What the Patch's solution:*
> So, for *conservative*, we would better *only transition to standby for the 
> failures in {color:#14892c}whitelist{color}:*
>  public enum RMFatalEventType {
>  {color:#14892c}// Source <- Store{color}
>  {color:#14892c}STATE_STORE_FENCED,{color}
>  {color:#14892c}STATE_STORE_OP_FAILED,{color}
> // Source <- Embedded Elector
>  EMBEDDED_ELECTOR_FAILED,
> {color:#14892c}// Source <- Admin Service{color}
>  {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}
> // Source <- Critical Thread Crash
>  CRITICAL_THREAD_CRASH
>  }
>  And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
> future added failure types, should crash RM, because we *cannot ensure* that 
> they will never cause RM cannot work in standby state, the *conservative* way 
> is to crash RM. Besides, after crash, the RM watchdog can know this and try 
> to repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to re-join the election, but it will never retry or 
crash later, so afterwards no zk related logs and the standby RM is forever 
hang.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

*Caused By:*

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author said:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

*What the Patch's solution:*

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 {color:#14892c}// Source <- Store{color}
 {color:#14892c}STATE_STORE_FENCED,{color}
 {color:#14892c}STATE_STORE_OP_FAILED,{color}

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}// Source <- Admin Service{color}
 {color:#14892c} TRANSITION_TO_ACTIVE_FAILED,{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }


 And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
future added failure types, should crash RM, because we *cannot ensure* that 
they will never cause RM cannot work in standby state, the *conservative* way 
is to crash RM. Besides, after crash, the RM watchdog can know this and try to 
repair the RM machine, send alerts, etc.

  was:
*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due 

[jira] [Created] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)
Yuqi Wang created YARN-9151:
---

 Summary: Standby RM hangs (not retry or crash) forever due to 
forever lost from leader election
 Key: YARN-9151
 URL: https://issues.apache.org/jira/browse/YARN-9151
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.9.2
Reporter: Yuqi Wang
Assignee: Yuqi Wang
 Fix For: 3.1.1
 Attachments: yarn_rm.zip

*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to re-join the election, but it will never retry or 
crash later, so afterwards no zk related logs and the standby RM is forever 
hang.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

*Caused By:*

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author 
[said|https://issues.apache.org/jira/browse/YARN-3742?focusedCommentId=15891385=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15891385]:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

*What the Patch's solution:*

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 *{color:#14892c}// Source <- Store{color}*
 *{color:#14892c}STATE_STORE_FENCED,{color}*
 *{color:#14892c}STATE_STORE_OP_FAILED,{color}*

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}*// Source <- Admin Service*{color}
{color:#14892c} *TRANSITION_TO_ACTIVE_FAILED,*{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }
 And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
future added failure types, should crash RM, because we *cannot ensure* that 
they will never cause RM cannot work in standby state, the *conservative* way 
is to crash RM. Besides, after crash, the RM watchdog can know this and try to 
repair the RM machine, send alerts, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9151) Standby RM hangs (not retry or crash) forever due to forever lost from leader election

2018-12-19 Thread Yuqi Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-9151:

Description: 
*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

 

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Found RMActiveServices's StandByTransitionRunnable object has already run 
previously, so immediately return
   
(The standby RM failed to re-join the election, but it will never retry or 
crash later, so afterwards no zk related logs and the standby RM is forever 
hang.)
{noformat}
So, this should be a bug in RM, because *RM should always try to join election* 
(give up join election should only happen on RM decide to crash), otherwise, a 
RM without inside the election can never become active again and start real 
works.

 

*Caused By:*

It is introduced by YARN-3742

The JIRA want to improve is that, when STATE_STORE_OP_FAILED RMFatalEvent 
happens, RM should transition to standby, instead of crash.
 *However, in fact, the JIRA makes ALL kinds of RMFatalEvent ONLY transition to 
standby, instead of crash.* (In contrast, before this change, RM makes all to 
crash instead of to standby)
 So, even if EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH happens, it will 
leave the standby RM continue not work, such as stay in standby forever.

And as the author said:
{quote}I think a good approach here would be to change the RMFatalEvent handler 
to transition to standby as the default reaction, *with shutdown as a special 
case for certain types of failures.*
{quote}
But the author is *too optimistic when implement the patch.*

 

*What the Patch's solution:*

So, for *conservative*, we would better *only transition to standby for the 
failures in {color:#14892c}whitelist{color}:*
 public enum RMFatalEventType {
 *{color:#14892c}// Source <- Store{color}*
 *{color:#14892c}STATE_STORE_FENCED,{color}*
 *{color:#14892c}STATE_STORE_OP_FAILED,{color}*

// Source <- Embedded Elector
 EMBEDDED_ELECTOR_FAILED,

{color:#14892c}*// Source <- Admin Service*{color}
 {color:#14892c} *TRANSITION_TO_ACTIVE_FAILED,*{color}

// Source <- Critical Thread Crash
 CRITICAL_THREAD_CRASH
 }
 And others, such as EMBEDDED_ELECTOR_FAILED or CRITICAL_THREAD_CRASH and 
future added failure types, should crash RM, because we *cannot ensure* that 
they will never cause RM cannot work in standby state, the *conservative* way 
is to crash RM. Besides, after crash, the RM watchdog can know this and try to 
repair the RM machine, send alerts, etc.

  was:
*Issue Summary:*
 Standby RM hangs (not retry or crash) forever due to forever lost from leader 
election

*Issue Repro Steps:*
 # Start multiple RMs in HA mode
 # Modify all hostnames in the zk connect string to different values in DNS. 
(In reality, we need to replace old/bad zk machines to new/good zk machines, so 
their DNS hostname will be changed.)

 

*Issue Logs:*

The RM is BN4SCH101222318

You can check the full RM log in attachment, yarn_rm.zip.

To make it clear, the whole story is:
{noformat}
Join Election
Win the leader (ZK Node Creation Callback)
  Start to becomeActive 
Start RMActiveServices 
Start CommonNodeLabelsManager failed due to zk connect UnknownHostException
Stop CommonNodeLabelsManager
Stop RMActiveServices
Create and Init RMActiveServices
  Fail to becomeActive 
  ReJoin Election
  Failed to Join Election due to zk connect UnknownHostException 
  (Here the exception is eat and just send transition to Standby event)
  Send RMFatalEvent to transition RM to standby
Transitioning RM to Standby
  Start StandByTransitionThread
  Already in standby state
  ReJoin Election
  Failed to Join Election 

[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-22 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410746#comment-16410746
 ] 

Yuqi Wang commented on YARN-7872:
-

Thanks [~leftnoteasy] for YARN-6592.

Does the "Rich placement constraints" can also be worked for the labeled nodes?

If yes, I think it may also help on this JIRA.

I will take a look in details later.

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-22 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410738#comment-16410738
 ] 

Yuqi Wang commented on YARN-8012:
-

Thanks [~jlowe], I get your point in your last paragraph, so let me refine the 
patch according to your suggestions. Very appreciated.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-22 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409152#comment-16409152
 ] 

Yuqi Wang edited comment on YARN-8012 at 3/22/18 7:15 AM:
--

Thanks [~jlowe]:).
{quote}Having a periodic monitor per container makes sense for handling the 
case where the NM suddenly disappears. We already use a lingering process per 
container for NM restart, as we need to record the container exit code even 
when the NM is temporarily missing. It would be awesome if we could leverage 
that existing process rather than create yet another monitoring process to 
reduce the per-container overhead, but I understand the reluctance to do this 
in C for the native container executors.
{quote}
As mentioned in the doc, implement as another process in Java can also help to 
make all platform's container executor (wrote in C, such as windows winutils, 
linux container-executor) to leverage the feature. And since it is platform 
independent, put the feature inside every platform's container executor is not 
make sense.

 
{quote}It was unclear in the document that the "ping" to the NM was not an RPC 
call but a REST query.
{quote}
 
 * Currently, there seems not benefit for RPC (tcp proto request) over Rest 
(http request). Do you see any benefits ?
 * This is just a first stage of this feature, and if it make sense and works 
well in production. We can refine it to use RPC

 
{quote}Would be good to elaborate the details of how the checker monitors the 
NM.
{quote}
This is mainly elaborated in Section: UNMANAGED CONTAINER DETECTION 

 
{quote}I would rather not see all the configurations be windows specific. The 
design implies this isn't something only Windows can implement, and I'd hate 
there to be separate Windows, Linux, BSD, Solaris, etc. versions of all of 
these settings. If the setting doesn't work on a particular platform we can 
document the limitations in the property description.
{quote}
Agree. The configuration is windows specific now is because for this patch, I 
only implement the feature for windows. 
 We can expand it after the first stage. However, we should also consider for 
win, it depends on DefaultContainerExecutor. For linux, it depends on 
LinuxContainerExecutor.

 
{quote}How does the container monitor authenticate with the NM in a secure 
cluster setup?
{quote}
Do you mean Secure Container Executor?

I have not investigated the Secure Container Executor. But may support it after 
the 1st stage.

 
{quote}Will the overhead of the new UnmanagedContainerChecker process will be 
counted against the overall container resource usage?
{quote}
Yes, but it is hard to avoid it, since we want the UnmanagedContainerChecker 
process can also be cleaned up after the container job object killed. So it 
should be inside container job object. And for winutils, it is also inside the 
job object and will be count into container resource usage. I think the usage 
is very little and can be configure by env: YARN_UCC_HEAPSIZE. So we can ignore 
it at least for the 1st stage.

 
{quote}I didn't follow the logic in the design document for why it doesn't make 
sense to retry launching the unmanaged monitor if it exits unexpectedly. It 
simply says, "Add the unmanaged container judgement logic (retrypolicy) in 
winutils is not suitable, it should be in UnmanagedContainerChecker." However 
this section is discussing how to handle an unexpected exit of 
UnmanagedContainerChecker, so why would it make sense to put the retry logic in 
the very thing we are retrying?
{quote}
Since YARN NM does not even retry the container executor process unexpected 
exit, and it happens rarely, we can ignore to retry the ucc process in the 
first stage. And if really required, we can add retry policy on the batch 
start-yarn-ucc.cmd instead of winutils.

 
{quote}Does it really make sense to catch Throwable in the monitor loop? Seems 
like it would make more sense to have this localized to where we are 
communicating with the NM, otherwise it could easily suppress OOM errors or 
other non-exceptions that would be better handled by letting this process die 
and relaunching a replacement.
{quote}
Agree, but it needs outside to retry the process.

Any thoughts for the whole feature? :)

 


was (Author: yqwang):
Thanks [~jlowe]:).
{quote}Having a periodic monitor per container makes sense for handling the 
case where the NM suddenly disappears. We already use a lingering process per 
container for NM restart, as we need to record the container exit code even 
when the NM is temporarily missing. It would be awesome if we could leverage 
that existing process rather than create yet another monitoring process to 
reduce the per-container overhead, but I understand the reluctance to do this 
in C for the native container executors.
{quote}
As mentioned in the doc, implement as another process in Java can also help to 
make all 

[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-22 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409152#comment-16409152
 ] 

Yuqi Wang commented on YARN-8012:
-

Thanks [~jlowe]:).
{quote}Having a periodic monitor per container makes sense for handling the 
case where the NM suddenly disappears. We already use a lingering process per 
container for NM restart, as we need to record the container exit code even 
when the NM is temporarily missing. It would be awesome if we could leverage 
that existing process rather than create yet another monitoring process to 
reduce the per-container overhead, but I understand the reluctance to do this 
in C for the native container executors.
{quote}
As mentioned in the doc, implement as another process in Java can also help to 
make all platform's container executor (wrote in C, such as windows winutils, 
linux container-executor) to leverage the feature. And since it is platform 
independent, put the feature inside every platform's container executor is not 
make sense.
{quote}It was unclear in the document that the "ping" to the NM was not an RPC 
call but a REST query.
{quote}
 
 * Currently, there seems not benefit for RPC (tcp proto request) over Rest 
(http request). Do you see any benefits ?
 * This is just a first stage of this feature, and if it make sense and works 
well in production. We can refine it to use RPC.

{quote}Would be good to elaborate the details of how the checker monitors the 
NM.
{quote}
This is mainly elaborated in Section: UNMANAGED CONTAINER DETECTION 

 
{quote}I would rather not see all the configurations be windows specific. The 
design implies this isn't something only Windows can implement, and I'd hate 
there to be separate Windows, Linux, BSD, Solaris, etc. versions of all of 
these settings. If the setting doesn't work on a particular platform we can 
document the limitations in the property description.
{quote}
Agree. The configuration is windows specific now is because for this patch, I 
only implement the feature for windows. 
We can expand it after the first stage. However, we should also consider for 
win, it depends on DefaultContainerExecutor. For linux, it depends on 
LinuxContainerExecutor.
{quote}How does the container monitor authenticate with the NM in a secure 
cluster setup?
{quote}
Do you mean Secure Container Executor?

I have not investigated the Secure Container Executor. But may support it after 
the 1st stage.
{quote}Will the overhead of the new UnmanagedContainerChecker process will be 
counted against the overall container resource usage?
{quote}
Yes, but it is hard to avoid it, since we want the UnmanagedContainerChecker 
process can also be cleaned up after the container job object killed. So it 
should be inside container job object. And for winutils, it is also inside the 
job object and will be count into container resource usage. I think the usage 
is very little and can be configure by env: YARN_UCC_HEAPSIZE. So we can ignore 
it at least for the 1st stage.

 
{quote}I didn't follow the logic in the design document for why it doesn't make 
sense to retry launching the unmanaged monitor if it exits unexpectedly. It 
simply says, "Add the unmanaged container judgement logic (retrypolicy) in 
winutils is not suitable, it should be in UnmanagedContainerChecker." However 
this section is discussing how to handle an unexpected exit of 
UnmanagedContainerChecker, so why would it make sense to put the retry logic in 
the very thing we are retrying?
{quote}
Since YARN NM does not even retry the container executor process unexpected 
exit, and it happens rarely. We can add retry policy on the batch 
start-yarn-ucc.cmd instead of winutils.
{quote}Does it really make sense to catch Throwable in the monitor loop? Seems 
like it would make more sense to have this localized to where we are 
communicating with the NM, otherwise it could easily suppress OOM errors or 
other non-exceptions that would be better handled by letting this process die 
and relaunching a replacement.
{quote}
Agree, but it needs outside to retry the process.

Any thoughts for the whole feature? :)

 

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM 

[jira] [Comment Edited] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-21 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407500#comment-16407500
 ] 

Yuqi Wang edited comment on YARN-7872 at 3/21/18 6:02 AM:
--

For deep learning jobs, the performance is very important, so the locality 
(such as Infiniband) is also very important.

But these jobs also want to use labeled nodes, such as labeled with FPGA 
(Type), GPU (Type) etc.


was (Author: yqwang):
For deep learning jobs, the performance is very important, so the locality is 
also very important.

But these jobs also want to use labeled nodes, such as labeled with FPGA 
(Type), GPU (Type) etc.

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-21 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407500#comment-16407500
 ] 

Yuqi Wang commented on YARN-7872:
-

For deep learning jobs, the performance is very important, so the locality is 
also very important.

But these jobs also want to use labeled nodes, such as labeled with FPGA 
(Type), GPU (Type) etc.

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407493#comment-16407493
 ] 

Yuqi Wang edited comment on YARN-7872 at 3/21/18 5:56 AM:
--

Thanks [~leftnoteasy].
{quote}Existing the patch is not backward compatible: it breaks behavior of an 
app requests locality + node partition. Before your patch, the behavior is, if 
requested locality is under requested partition, it can be allocated, otherwise 
it will keep in pending state. After your patch, requested partition will be 
silently ignored, which is not ideal. And it breaks how we calculate pending 
resource of each partition.
{quote}
Seems the behavior is that YARN just *do not allow* to specify node locality + 
node partition together, so it is meaningless to talk about "ignore one and 
take another". Please check below code in both 2.7 and trunk:

 
{code:java}
// we don't allow specify label expression other than resourceName=ANY now
if (!ResourceRequest.ANY.equals(resReq.getResourceName())
&& labelExp != null && !labelExp.trim().isEmpty()) {
  throw new InvalidLabelResourceRequestException(
  "Invalid resource request, queue=" + queueInfo.getQueueName()
  + " specified node label expression in a "
  + "resource request has resource name = "
  + resReq.getResourceName());
}
{code}
 

 

So, I think the behavior is the same before and after the patch, i.e.:

If an app requests locality + node partition, an 
InvalidLabelResourceRequestException will be throw (request failed).

 

What the patch does is that, it allow user to just specify locality and without 
specify nodelabel to request one specific labeled node.

 

The pending resource of each partition is really a good concern.

Any plan to totally support node locality to work with node label?

 

Thanks again :)

 

 


was (Author: yqwang):
Thanks [~leftnoteasy].
{quote}Existing the patch is not backward compatible: it breaks behavior of an 
app requests locality + node partition. Before your patch, the behavior is, if 
requested locality is under requested partition, it can be allocated, otherwise 
it will keep in pending state. After your patch, requested partition will be 
silently ignored, which is not ideal. And it breaks how we calculate pending 
resource of each partition.
{quote}
Seems the behavior is that YARN just *do not allow* to specify node locality + 
node partition together, so it is meaningless to talk about "ignore one and 
take another". Please check below code in both 2.7 and trunk:

 
{code:java}
// we don't allow specify label expression other than resourceName=ANY now
if (!ResourceRequest.ANY.equals(resReq.getResourceName())
&& labelExp != null && !labelExp.trim().isEmpty()) {
  throw new InvalidLabelResourceRequestException(
  "Invalid resource request, queue=" + queueInfo.getQueueName()
  + " specified node label expression in a "
  + "resource request has resource name = "
  + resReq.getResourceName());
}
{code}
 

 

So, I think the behavior is the same before and after the patch, i.e.:

If an app requests locality + node partition, an 
InvalidLabelResourceRequestException will be throw (request failed).

 

What the patch does is that, it allow user to just specify locality and without 
specify nodelabel to select one specific node.

 

The pending resource of each partition is really a good concern.

Any plan to totally support node locality to work with node label?

 

Thanks again :)

 

 

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for 

[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407493#comment-16407493
 ] 

Yuqi Wang commented on YARN-7872:
-

Thanks [~leftnoteasy].
{quote}Existing the patch is not backward compatible: it breaks behavior of an 
app requests locality + node partition. Before your patch, the behavior is, if 
requested locality is under requested partition, it can be allocated, otherwise 
it will keep in pending state. After your patch, requested partition will be 
silently ignored, which is not ideal. And it breaks how we calculate pending 
resource of each partition.
{quote}
Seems the behavior is that YARN just *do not allow* to specify node locality + 
node partition together, so it is meaningless to talk about "ignore one and 
take another". Please check below code in both 2.7 and trunk:

 
{code:java}
// we don't allow specify label expression other than resourceName=ANY now
if (!ResourceRequest.ANY.equals(resReq.getResourceName())
&& labelExp != null && !labelExp.trim().isEmpty()) {
  throw new InvalidLabelResourceRequestException(
  "Invalid resource request, queue=" + queueInfo.getQueueName()
  + " specified node label expression in a "
  + "resource request has resource name = "
  + resReq.getResourceName());
}
{code}
 

 

So, I think the behavior is the same before and after the patch, i.e.:

If an app requests locality + node partition, an 
InvalidLabelResourceRequestException will be throw (request failed).

 

What the patch does is that, it allow user to just specify locality and without 
specify nodelabel to select one specific node.

 

The pending resource of each partition is really a good concern.

Any plan to totally support node locality to work with node label?

 

Thanks again :)

 

 

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407438#comment-16407438
 ] 

Yuqi Wang edited comment on YARN-7872 at 3/21/18 4:22 AM:
--

{quote}If not acceptable (i.e. the current behavior is by designed), so, how 
can we use locality to request container within these labeled nodes?
{quote}
What my problem here is that, given the 2 nodes in my previous comment, how can 
I just ask for only Node A (NodeLocality = HostName A and RelaxLocality= false) 
instead of ask both A and B (NodeLabel = persistent)?


was (Author: yqwang):
{quote}If not acceptable (i.e. the current behavior is by designed), so, how 
can we use locality to request container within these labeled nodes?
{quote}
What my problem here is that, given the 2 nodes in my previous comment, how can 
I just ask for only Node A (Locality = HostName A and RelaxLocality= false) 
instead of ask both A and B with nodelabel = persistent?

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407438#comment-16407438
 ] 

Yuqi Wang edited comment on YARN-7872 at 3/21/18 4:22 AM:
--

{quote}If not acceptable (i.e. the current behavior is by designed), so, how 
can we use locality to request container within these labeled nodes?
{quote}
What my problem here is that, given the 2 nodes in my previous comment, how can 
I just ask for only Node A (NodeLocality = HostName A and RelaxLocality= false) 
instead of ask for both A and B (NodeLabel = persistent)?


was (Author: yqwang):
{quote}If not acceptable (i.e. the current behavior is by designed), so, how 
can we use locality to request container within these labeled nodes?
{quote}
What my problem here is that, given the 2 nodes in my previous comment, how can 
I just ask for only Node A (NodeLocality = HostName A and RelaxLocality= false) 
instead of ask both A and B (NodeLabel = persistent)?

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407438#comment-16407438
 ] 

Yuqi Wang commented on YARN-7872:
-

{quote}If not acceptable (i.e. the current behavior is by designed), so, how 
can we use locality to request container within these labeled nodes?
{quote}
What my problem is that, given the 2 nodes in my previous comment, how can I 
just ask for only Node A (Locality = HostName A and RelaxLocality= false) 
instead of ask both A and B with nodelabel = persistent?

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407438#comment-16407438
 ] 

Yuqi Wang edited comment on YARN-7872 at 3/21/18 4:21 AM:
--

{quote}If not acceptable (i.e. the current behavior is by designed), so, how 
can we use locality to request container within these labeled nodes?
{quote}
What my problem here is that, given the 2 nodes in my previous comment, how can 
I just ask for only Node A (Locality = HostName A and RelaxLocality= false) 
instead of ask both A and B with nodelabel = persistent?


was (Author: yqwang):
{quote}If not acceptable (i.e. the current behavior is by designed), so, how 
can we use locality to request container within these labeled nodes?
{quote}
What my problem is that, given the 2 nodes in my previous comment, how can I 
just ask for only Node A (Locality = HostName A and RelaxLocality= false) 
instead of ask both A and B with nodelabel = persistent?

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407433#comment-16407433
 ] 

Yuqi Wang commented on YARN-7872:
-

Thanks [~jlowe] for the reply :)

I agree that node label is used to partition nodes in the cluster. 

This patch does not break it, since user still can use *node label* to select 
nodes he wants.

What the patch provides more is that, it can also allow user to use *node 
locality* (such as a specific node) to select nodes within a specific nodelabel.

For example, we have 2 nodes, 

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{A}{color} RackName: 
\{/default-rack}]

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{B}{color} RackName: 
\{/default-rack}]

Before this patch, user cannot ask for only A node, since A node has a 
nodelabel persistent and YARN does not allow to specify locality with nodelabel.

After this patch, use can ask for only A node.

So node locality and node label can be orthogonal, and this can provide more 
controllability to select nodes for user.

 

Besides, queue is used to control which user can access which node label in how 
much capacity. This patch does not break it, since only the nodes (nodelabel) 
which can be accessed by target queue can be sent in to 
LeafQueue#assignContainer.

So the ACL is still under controlled, and the patch just breaks that "labeled 
nodes can only be allocated for labeled request". 

Overall, my point is that, node locality and node label can be orthogonal for 
user to select nodes.

.

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406156#comment-16406156
 ] 

Yuqi Wang commented on YARN-7872:
-

[~jlowe], could you please also take a look at this?
Only a little change and seems trunk also has the same issue.

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-20 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406147#comment-16406147
 ] 

Yuqi Wang commented on YARN-8012:
-

[~jlowe], I have uploaded the design doc for this JIRA, please check :)

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-20 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Attachment: YARN-8012 - Unmanaged Container Cleanup.pdf

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-19 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404360#comment-16404360
 ] 

Yuqi Wang edited comment on YARN-8012 at 3/19/18 6:55 AM:
--

[~jlowe], [~bikassaha], could you please take a look at this? :)

This can also help to resolve this YARN-2047. And it does not need state store.

Appreciate your insights!


was (Author: yqwang):
[~jlowe], [~bikassaha], could you please take a look at this? :)

Appreciate your insights!

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-18 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404360#comment-16404360
 ] 

Yuqi Wang commented on YARN-8012:
-

[~jlowe], [~bikassaha], could you please take a look at this? :)

Appreciate your insights!

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-2047) RM should honor NM heartbeat expiry after RM restart

2018-03-18 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404357#comment-16404357
 ] 

Yuqi Wang edited comment on YARN-2047 at 3/19/18 4:15 AM:
--

[~bikassaha] and [~hex108], could you please check YARN-8012, it can help to 
resolve this issue. And it does not need state store.


was (Author: yqwang):
[~bikassaha] and [~hex108], could you please check YARN-8012, it can help to 
resolve this issue.

> RM should honor NM heartbeat expiry after RM restart
> 
>
> Key: YARN-2047
> URL: https://issues.apache.org/jira/browse/YARN-2047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Priority: Major
>
> After the RM restarts, it forgets about existing NM's (and their potentially 
> decommissioned status too). After restart, the RM cannot maintain the 
> contract to the AM's that a lost NM's containers will be marked finished 
> within the expiry time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart

2018-03-18 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404357#comment-16404357
 ] 

Yuqi Wang commented on YARN-2047:
-

[~bikassaha] and [~hex108], could you please check YARN-8012, it can help to 
resolve this issue.

> RM should honor NM heartbeat expiry after RM restart
> 
>
> Key: YARN-2047
> URL: https://issues.apache.org/jira/browse/YARN-2047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Priority: Major
>
> After the RM restarts, it forgets about existing NM's (and their potentially 
> decommissioned status too). After restart, the RM cannot maintain the 
> contract to the AM's that a lost NM's containers will be marked finished 
> within the expiry time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node to free up 
resource for other urgent computations on the node.
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node to free up 
resource for other native processes on the node.
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node to free up 
resource for other native processes on the node.
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other native processes on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390782#comment-16390782
 ] 

Yuqi Wang edited comment on YARN-8012 at 3/8/18 6:14 AM:
-

*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container.

{color:#f79232}The UT will be added if the design is agreed.{color}

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
or
 not found in the NM container list.


was (Author: yqwang):
*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container. 
{color:#f79232}{color:#59afe1}Cleanup for more container resources will be 
supported. And the UT will be added if the design is agreed.{color}{color}

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
or
 not found in the NM container list.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390782#comment-16390782
 ] 

Yuqi Wang edited comment on YARN-8012 at 3/8/18 6:13 AM:
-

*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container. 
{color:#f79232}{color:#59afe1}Cleanup for more container resources will be 
supported. And the UT will be added if the design is agreed.{color}{color}

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
or
 not found in the NM container list.


was (Author: yqwang):
*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container. 
{color:#59afe1}Cleanup for more container resources will be supported. And the 
UT will be added if the design is agreed.{color}

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
or
not found in the NM container list.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see 
[YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see 
[YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see 
> [YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN and the node:
>  ** Cause YARN and node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container. Cleanup for 
more container resources will be supported. And the UT will be added if the 
design is agreed.

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
or
 not found in the NM container list.


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container* is a container which is no longer managed by NM. 
> Thus, it is cannot be managed by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  # For container resource managed by YARN, such as container job object
>  and disk data:
>  ** NM service is disabled or removed on the node.
>  ** NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  ** NM has bugs, such as wrongly mark live container as complete.
>  #  For container resource unmanaged by YARN:
>  ** User breakaway processes from container job object.
>  ** User creates VMs from container job object.
>  ** User acquires other resource on the machine which is unmanaged by
>  YARN, such as produce data outside Container folder.
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN and the node:
>  ** Cause YARN and node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App 

[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Attachment: YARN-8012-branch-2.7.1.001.patch

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container* is a container which is no longer managed by NM. 
> Thus, it is cannot be managed by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  # For container resource managed by YARN, such as container job object
>  and disk data:
>  ** NM service is disabled or removed on the node.
>  ** NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  ** NM has bugs, such as wrongly mark live container as complete.
>  #  For container resource unmanaged by YARN:
>  ** User breakaway processes from container job object.
>  ** User creates VMs from container job object.
>  ** User acquires other resource on the machine which is unmanaged by
>  YARN, such as produce data outside Container folder.
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN and the node:
>  ** Cause YARN and node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time
> *Initial patch for review:*
> For the initial patch, the unmanaged container cleanup feature on Windows, 
> only can cleanup the container job object of the unmanaged container. Cleanup 
> for more container resources will be supported. And the UT will be added if 
> the design is agreed.
> The current container will be considered as unmanaged when:
>  # NM is dead:
>  ** Failed to check whether container is managed by NM within timeout.
>  # NM is alive but container is
>  org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
>  or not found:
>  ** The container is 
> org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE or
>  not found in the NM container list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)
Yuqi Wang created YARN-8012:
---

 Summary: Support Unmanaged Container Cleanup
 Key: YARN-8012
 URL: https://issues.apache.org/jira/browse/YARN-8012
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Yuqi Wang
Assignee: Yuqi Wang
 Fix For: 2.7.1


An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container. Cleanup for 
more container resources will be supported. And the UT will be added if the 
design is agreed.

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
or
 not found in the NM container list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Target Version/s: 3.0.0, 2.7.2  (was: 2.7.2)

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-04 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351675#comment-16351675
 ] 

Yuqi Wang commented on YARN-7872:
-

[~leftnoteasy], could you please take a look at this? :)

Appreciate your insights!

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that (at least for version 2.7 and 
2.8), the node cannot allocate container for the request, because the node 
label is not matched when the leaf queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we clearly know it 
should not have node label), we should use the requested resource name to match 
with the node instead of using the requested node label to match with the node. 
And this resource name matching should be safe, since the node whose node label 
is not accessible for the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?

  was:
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that (at least for version 2.7 and 
2.8), the node cannot allocate container for the request, because the node 
label is not matched when the leaf queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use the requested resource name to match with 
the node instead of using the requested node label to match with the node. And 
this resource name matching should be safe, since the node whose node label is 
not accessible for the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that (at least for version 2.7 and 
2.8), the node cannot allocate container for the request, because the node 
label is not matched when the leaf queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use the requested resource name to match with 
the node instead of using the requested node label to match with the node. And 
this resource name matching should be safe, since the node whose node label is 
not accessible for the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?

  was:
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that (at least for version 2.7 and 
2.8), the node cannot allocate container for the request, because the node 
label is not matched when the leaf queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that (at least for version 2.7 and 
2.8), the node cannot allocate container for the request, because the node 
label is not matched when the leaf queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?

  was:
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request, because the node label is not matched when the leaf 
queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request, because the node label is not matched when the leaf 
queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?

  was:
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
*Issue summary:*

labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

 

*For example:*

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

 

*Possible solution:*

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

 

*Discussion:*

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

If not acceptable (i.e. the current behavior is by designed), so, how can we 
use locality to request container within these labeled nodes?

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within these labeled nodes?*


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within labeled nodes?***

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
specify* *locality to request container within labeled nodes?***


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within these labeled nodes?*

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within labeled nodes?*


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within labeled nodes?*

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within labeled nodes?***


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
specify* *locality to request container within labeled nodes?***

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity 

[jira] [Comment Edited] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348571#comment-16348571
 ] 

Yuqi Wang edited comment on YARN-7872 at 2/1/18 1:26 PM:
-

Just a init to trigger Jenkins.


was (Author: yqwang):
Just a init try

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we know it should 
> not have node label), we should use resource name to match with the node 
> instead of using node label to match with the node. And this resource name 
> matching should be safe, since the node whose node label is not accessible 
> for the queue will not be sent to the leaf queue.
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Attachment: YARN-7872-branch-2.7.2.001.patch

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we know it should 
> not have node label), we should use resource name to match with the node 
> instead of using node label to match with the node. And this resource name 
> matching should be safe, since the node whose node label is not accessible 
> for the queue will not be sent to the leaf queue.
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk and other hadoop versions.

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Fix Version/s: (was: 2.8.0)

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we know it should 
> not have node label), we should use resource name to match with the node 
> instead of using node label to match with the node. And this resource name 
> matching should be safe, since the node whose node label is not accessible 
> for the queue will not be sent to the leaf queue.
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Target Version/s: 2.7.2  (was: 2.8.0, 2.7.2)

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we know it should 
> not have node label), we should use resource name to match with the node 
> instead of using node label to match with the node. And this resource name 
> matching should be safe, since the node whose node label is not accessible 
> for the queue will not be sent to the leaf queue.
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.8.0, 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of node label to match with the node. And it should be safe, since the 
node which is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.8.0, 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of node label to match with the node. And it should be safe, since the 
node which is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (besides, it should not 
have node label), we should use resource name to match with the node instead of 
node label to match with the node. And it should be safe, since the node which 
is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.8.0, 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.
> However, node locality and node label 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (besides, it should not 
have node label), we should use resource name to match with the node instead of 
node label to match with the node. And it should be safe, since the node which 
is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behaiour is:
 The node cannot allocate container for the request because of the node label 
not matched in the leaf queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (besides, it should not 
have node label), we should use resource name to match with the node instead of 
node label to match with the node. And it should be safe, since the node which 
is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.8.0, 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.
> However, node locality and node label should be two 

[jira] [Updated] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:

Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behaiour is:
 The node cannot allocate container for the request because of the node label 
not matched in the leaf queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (besides, it should not 
have node label), we should use resource name to match with the node instead of 
node label to match with the node. And it should be safe, since the node which 
is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
[Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

 

Current RM capacity scheduler's behaiour is:
The node cannot allocate container for the request because of the node label 
not matched in the leaf queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (besides, it should not 
have node label), we should use resource name to match with the node instead of 
node label to match with the node. And it should be safe, since the node which 
is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.


> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.8.0, 2.7.2
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behaiour is:
>  The node cannot allocate container for the request because of the node label 
> not matched in the leaf queue assign container.
> However, node locality and node label should be two 

[jira] [Created] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-01 Thread Yuqi Wang (JIRA)
Yuqi Wang created YARN-7872:
---

 Summary: labeled node cannot be used to satisfy locality specified 
request
 Key: YARN-7872
 URL: https://issues.apache.org/jira/browse/YARN-7872
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, capacityscheduler, resourcemanager
Affects Versions: 2.7.2
Reporter: Yuqi Wang
Assignee: Yuqi Wang
 Fix For: 2.7.2, 2.8.0


labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
[Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

 

Current RM capacity scheduler's behaiour is:
The node cannot allocate container for the request because of the node label 
not matched in the leaf queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (besides, it should not 
have node label), we should use resource name to match with the node instead of 
node label to match with the node. And it should be safe, since the node which 
is not accessible for the queue will not be sent in the leaf queue.

Attachment is the fix according to this principle, please help to review.

Without it, we cannot use locality to request container within these labeled 
nodes.

If the fix is acceptable, we should also recheck whether the same issue happens 
in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-18 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133886#comment-16133886
 ] 

Yuqi Wang commented on YARN-6959:
-

[~jianhe]
Great! Thank you so much!

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-18 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133224#comment-16133224
 ] 

Yuqi Wang commented on YARN-6959:
-

[~jianhe]
Seems the UT failures are not caused by my patch, please check.

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-18 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.8.002.patch

Update 2.8 patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129744#comment-16129744
 ] 

Yuqi Wang commented on YARN-6959:
-

[~jianhe]
Seems the UT failures are not caused by my patch, please check.

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.006.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.003.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.003.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.001.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.002.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.002.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.004.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.004.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.001.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.002.patch, YARN-6959.003.patch, 
> YARN-6959.004.patch, YARN-6959.005.patch, YARN-6959-branch-2.7.001.patch, 
> YARN-6959-branch-2.7.002.patch, YARN-6959-branch-2.7.003.patch, 
> YARN-6959-branch-2.7.004.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.005.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.002.patch, YARN-6959.003.patch, 
> YARN-6959.004.patch, YARN-6959.005.patch, YARN-6959-branch-2.7.001.patch, 
> YARN-6959-branch-2.7.002.patch, YARN-6959-branch-2.7.003.patch, 
> YARN-6959-branch-2.7.004.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.004.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, 
> YARN-6959-branch-2.7.003.patch, YARN-6959-branch-2.7.004.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-15 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128348#comment-16128348
 ] 

Yuqi Wang commented on YARN-6959:
-

[~jianhe]

I updated the new patch for 2.7, do you know how to trigger jenkins against 
branch-2.7?

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, 
> YARN-6959-branch-2.7.003.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-15 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.003.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, 
> YARN-6959-branch-2.7.003.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >