subject:"\"\\\[jira\\\] \\\[Commented\\\] \\\(YARN\\\-3222\\\) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order\""

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-09-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726498#comment-14726498
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2279 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2279/])
YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 
4620767156ecc43424bc6c7c4d50519e2563cc69)
* hadoop-yarn-project/CHANGES.txt


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-09-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726377#comment-14726377
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #330 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/330/])
YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 
4620767156ecc43424bc6c7c4d50519e2563cc69)
* hadoop-yarn-project/CHANGES.txt


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-09-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726371#comment-14726371
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1065 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1065/])
YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 
4620767156ecc43424bc6c7c4d50519e2563cc69)
* hadoop-yarn-project/CHANGES.txt


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-09-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726373#comment-14726373
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #321 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/321/])
YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 
4620767156ecc43424bc6c7c4d50519e2563cc69)
* hadoop-yarn-project/CHANGES.txt


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-09-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726376#comment-14726376
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2260 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2260/])
YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 
4620767156ecc43424bc6c7c4d50519e2563cc69)
* hadoop-yarn-project/CHANGES.txt


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-09-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726378#comment-14726378
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #338 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/338/])
YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 
4620767156ecc43424bc6c7c4d50519e2563cc69)
* hadoop-yarn-project/CHANGES.txt


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-09-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726216#comment-14726216
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8382 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8382/])
YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 
4620767156ecc43424bc6c7c4d50519e2563cc69)
* hadoop-yarn-project/CHANGES.txt


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-07-27 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642813#comment-14642813
 ] 

Sangjin Lee commented on YARN-3222:
---

The merge to 2.6.0 is straightforward.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347037#comment-14347037
 ] 

Hudson commented on YARN-3222:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347000#comment-14347000
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346934#comment-14346934
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346925#comment-14346925
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346720#comment-14346720
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/856/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346710#comment-14346710
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346104#comment-14346104
 ] 

Hudson commented on YARN-3222:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7248 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7248/])
YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is 
reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev 
b2f1ec312ee431aef762cfb49cb29cd6f4661e86)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java


> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346069#comment-14346069
 ] 

Jian He commented on YARN-3222:
---

thanks !  committing

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346013#comment-14346013
 ] 

Rohith commented on YARN-3222:
--

Had glance at javac and javadoc warning, this looks unrelated to patch

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345987#comment-14345987
 ] 

Hadoop QA commented on YARN-3222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702276/0005-YARN-3222.patch
  against trunk revision e17e5ba.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1151 javac 
compiler warnings (more than the trunk's current 185 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-distcp.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6828//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6828//console

This message is automatically generated.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345909#comment-14345909
 ] 

Rohith commented on YARN-3222:
--

bq. check you added earlier about sending NodeResourceUpdate event only if the 
node resource is different
Agree

Updated the patch addressing above comment. Kindly review it.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345513#comment-14345513
 ] 

Jian He commented on YARN-3222:
---

thanks Rohith !   
I think the condition check you added earlier about sending NodeResourceUpdate 
event only if the node resource is different is useful, that saves some 
traffic. would you mind adding that too ? 
{code}
if (rmNode.getState().equals(NodeState.RUNNING)) {
  // Update scheduler node's capacity for reconnect node.
  rmNode.context
  .getDispatcher()
  .getEventHandler()
  .handle(
  new NodeResourceUpdateSchedulerEvent(rmNode, ResourceOption
  .newInstance(newNode.getTotalCapability(), -1)));
}
{code}

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344919#comment-14344919
 ] 

Hadoop QA commented on YARN-3222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702122/0004-YARN-3222.patch
  against trunk revision 9ae7f9e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6818//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6818//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6818//console

This message is automatically generated.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344802#comment-14344802
 ] 

Rohith commented on YARN-3222:
--

Kindly review the update patch that fixes 1& 2 in as mentioned in earlier 
comment.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch, 0004-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344790#comment-14344790
 ] 

Rohith commented on YARN-3222:
--

For handling 3rd point, raised issue YARN-3286

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-03 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344739#comment-14344739
 ] 

Rohith commented on YARN-3222:
--

Had a mail chat with [~jianhe] regarding the issue's observed in this jira 
discussions and decided to split up the jira into 2 separate jira. The observed 
issues in ReconnectNodeTransition are
# As per defect description, order of node_resource_update and node_added 
events sending to schedulers. If Node_added events is being sent to schedulers 
then no need of sending node_resource_update event from RMNode again to 
scheduler which is not necessarily required.
# If the RMNode state is RUNNING then Node_usable event not necessarily to be 
sent.
# If a node is reconnceted with different capability, then 
RMNode#totalCapability remains with old capability. This has to be updated with 
new capability.

1 and 2 are going to handle in this jira. 3 issue will be done in separate jira.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-02 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344553#comment-14344553
 ] 

Rohith commented on YARN-3222:
--

I thought in the below way, for handling race in the above scenario discussed
# if oldNode is same as newNode and change in the capability, then update the 
resource first in scheduler.
## ClusterResource=5gb+5gb
## Update Resource with new node capability, ClusterResource=5gb+10gb(new 
capability).
# Remove node with new capability
## ClusterResource=15gb-10gb(new capability)=5gb
# Add Node with new capability
## ClusterResouce=5gb+10gb=15gb which is expected and 
{{RMNode#totalCapability}} is 10gb

Does it make sense?

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-02 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344468#comment-14344468
 ] 

Rohith commented on YARN-3222:
--

bq. I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning 
to send the node_usable event in ReconnectEvent. As you said earlier, the next 
heartbeat will trigger this event based on the node's own health report. 
Right.. It is not required. I will remove this

bq. The transition is invoked only at running and unhealthy state, so I think 
this is not possible? 
I see. 

bq. Even by sending an event it's still possible that removeNode was removing 
new capability from cluster resource ?
I see a potential risk even if RMNodeResourceUpdateEvent has sent because say 
Asyndispatcher has events Node_removed,RMNodeResourceUpdate. AsyncDispatcher 
fetch Node_removed and put it SchedulerEventDispatcher queue. IAC, if 
SchedulerEventDispatcher is dealyed processing the node_removed may be because 
of more scheduler events, then RMNodeResourceUpdate is processed first. So 
there is chance of removing new capability from cluster resource. 
Any thoughts for handling this issue?

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-02 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343896#comment-14343896
 ] 

Jian He commented on YARN-3222:
---

bq.   I have handled this by sending RMNodeResourceUpdateEvent if there is any 
change in capability
Even by sending an event, it's still possible that removeNode was removing new 
capability from cluster resource ?

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-02 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343846#comment-14343846
 ] 

Jian He commented on YARN-3222:
---

thanks for updating. 
I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning to 
send the node_usable event in ReconnectEvent. As you said earlier, the next 
heartbeat will trigger this event based on the node's own health report. 
bq.  It mean, node state can be decommissioned/lost/running
The transition is invoked only at running and unhealthy state, so I think this 
is not possible? 

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343106#comment-14343106
 ] 

Hadoop QA commented on YARN-3222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701861/0003-YARN-3222.patch
  against trunk revision ca1c00b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6800//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6800//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6800//console

This message is automatically generated.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-03-02 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343036#comment-14343036
 ] 

Rohith commented on YARN-3222:
--

bq. are the test failures related ?
Yes , Since totalCapability was set directly before sending NodeRemovedEvent, 
removeNode was removing new capability from cluster resource. I have handled 
this by sending RMNodeResourceUpdateEvent if there is any change in capability

bq. we may not need to send the NODE_USABLE event, if the node were already at 
the running state, right ?
yes, done

bq. we can make the following two condition checks consistent as checking for 
RUNNING
here check is done for not unhealthy state. It mean, node state can be 
decommissioned/lost/running. I'd suggest to keep as it is.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 
> 0003-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340653#comment-14340653
 ] 

Jian He commented on YARN-3222:
---

actually, we may not need to send the NODE_USABLE event,  if the node were 
already at the running state, right ?
also, we can make the following two condition checks consistent as checking for 
RUNNING.
{code}
 if (rmNode.getState() != NodeState.UNHEALTHY) {
// Only add new node if old state is not UNHEALTHY
 if (rmNode.getState().equals(NodeState.RUNNING)) {
{code}

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340618#comment-14340618
 ] 

Jian He commented on YARN-3222:
---

lgtm, are the test failures related ?

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340041#comment-14340041
 ] 

Hadoop QA commented on YARN-3222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701317/0002-YARN-3222.patch
  against trunk revision 48c7ee7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6779//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6779//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6779//console

This message is automatically generated.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-27 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339972#comment-14339972
 ] 

Rohith commented on YARN-3222:
--

Updated the patch for handling following scenarios
# Avoid sending Node_resource_Update event to schedulers from RMNode when 
Node_added event is sent previously
# Send NODE_USABLE event if reconnected node is healthy only.
# Update resource {{totalCapability}} in RMNode if reconnected node is same as 
old node

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-26 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339799#comment-14339799
 ] 

Rohith commented on YARN-3222:
--

bq. NODE_USABLE event is sent regardless the reconnected node is healthy or not 
healthy, which is incorrect, right ?
Yes, I think it was assumed like if new node is reconnecting then NM is 
healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st 
heartbeat NodeStatus can be moved from Unhealthy to Running.

I see another potential issue that if old node is retaining then RMnode has to 
be updated {{totalCapability}} with new RMNode resource.  But in flow, 
{{totalCapability}} is not updated. This result , scheduler has updated 
resources value but RMNode has stale memory. Any client getting RMnode 
capabilit from RMnode would end up in wrong node resource value.
{code}
if (noRunningApps) {
// some code
rmNode.context.getDispatcher().getEventHandler().handle(
new NodeRemovedSchedulerEvent(rmNode));

if (rmNode.getHttpPort() == newNode.getHttpPort()) {
   if (rmNode.getState() != NodeState.UNHEALTHY) {
// Only add new node if old state is not UNHEALTHY
rmNode.context.getDispatcher().getEventHandler().handle(
new NodeAddedSchedulerEvent(newNode));  // NEW NODE CAPABILITY 
SHOULD BE UPDATED TO OLD NODE
  }
} else {
  // Reconnected node differs, so replace old node and start new node
rmNode.context.getDispatcher().getEventHandler().handle(
new RMNodeStartedEvent(newNode.getNodeID(), null, null)); // No 
need to update totalCapability since old node is replaced with new node.
}
  }
{code}

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339270#comment-14339270
 ] 

Jian He commented on YARN-3222:
---

looks good to me.
while looking at this,  may found another bug;  NODE_USABLE event is sent 
regardless the reconnected node is healthy or not healthy, which is incorrect, 
right ? 
{code}
  rmNode.context.getDispatcher().getEventHandler().handle(
  new NodesListManagerEvent(
  NodesListManagerEventType.NODE_USABLE, rmNode));
{code}

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-24 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335958#comment-14335958
 ] 

Rohith commented on YARN-3222:
--

[~jianhe] kindly review the analysis and patch.  I had look at test failures 
and dont think test failures are not related to this patch. 

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1406#comment-1406
 ] 

Hadoop QA commented on YARN-3222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700180/0001-YARN-3222.patch
  against trunk revision fe7a302.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairReservationSystem

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6698//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6698//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6698//console

This message is automatically generated.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-23 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333252#comment-14333252
 ] 

Rohith commented on YARN-3222:
--

Kindly review the patch, the patch is verified mannually deploying in cluster 
since tests is not added.
In the patch, I have moved handlingRunningApplications to inside of else block. 
This need not be in common to noAppsRunning.

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-20 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329136#comment-14329136
 ] 

Rohith commented on YARN-3222:
--

I see there are 2 ways of fixing the issue.
# Always send NODE_RESOURCE_UPDATE event to scheduler via 
RMNodeEventType.RESOURCE_UPDATE of RMnode
# When NODE_ADDED event is sent to scheduler, again sending 
NODE_RESOURCE_UPDATE event to same node ReconnectedNodeTransition is duplicate 
update request because scheduler has already been updated resources with newly 
added node i.e NODE_REMOVED->NODE_ADDED-->NODE_RESOURCE_UPDATE-->. So if NO 
applications are  running in the node, then it is not required to send 
node_resource_update request.

I would prefer for 2nd option because here one duplicate resource update can be 
optimized. 

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-18 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326965#comment-14326965
 ] 

Rohith commented on YARN-3222:
--

Attaching the logs which gives more information about issue. In the below log, 
RM has shutdown with NPE while updating node_resource. And observe scheduler 
events dispatched from AsyncDispatcher in 
*org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.\**. Here the 
order is NODE_REMOVED --> NODE_RESOURCE_UPDATE --> NODE_ADDED --> 
NODE_LABELS_UPDATE
{noformat}
2015-02-19 09:14:57,212 INFO  [main] util.RackResolver 
(RackResolver.java:coreResolve(109)) - Resolved 127.0.0.1 to /default-rack
2015-02-19 09:14:57,213 INFO  [main] resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(313)) - Reconnect from the 
node at: 127.0.0.1
2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeReconnectEvent.EventType:
 RECONNECTED
2015-02-19 09:14:57,215 INFO  [main] resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(343)) - NodeManager from node 
127.0.0.1(cmPort: 1234 httpPort: 3) registered with capability: , assigned nodeId 127.0.0.1:1234
2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type RECONNECTED
2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeRemovedSchedulerEvent.EventType:
 NODE_REMOVED
2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStartedEvent.EventType:
 STARTED
2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type STARTED
2015-02-19 09:14:57,266 INFO  [AsyncDispatcher event handler] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(424)) - 127.0.0.1:1234 Node Transitioned from NEW to 
RUNNING
2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: 
NODE_USABLE
2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeResourceUpdateSchedulerEvent.EventType:
 NODE_RESOURCE_UPDATE
2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeAddedSchedulerEvent.EventType:
 NODE_ADDED
2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: 
NODE_USABLE
2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType:
 NODE_LABELS_UPDATE
2015-02-19 09:14:57,267 INFO  [ResourceManager Event Processor] 
capacity.CapacityScheduler (CapacityScheduler.java:removeNode(1267)) - Removed 
node 127.0.0.1:1234 clusterResource: 
2015-02-19 09:14:57,267 FATAL [ResourceManager Event Processor] 
resourcemanager.ResourceManager (ResourceManager.java:run(688)) - Error in 
handling event type NODE_RESOURCE_UPDATE to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:548)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1119)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:120)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:679)
at java.lang.Thread.run(Thread.java:745)
2015-02-19 09:14:57,280 INFO  [ResourceManager Event Processor] 
resourcemanager.ResourceManager (ResourceManager.java:run(692)) - Exiting, 
bbye..
{noformat

41 matches

Mail list logo