[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-10-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960972#comment-14960972
 ] 

Junping Du commented on YARN-3896:
--

I think we may also need to fix 
NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM() to retry the 
register to RM when have non-fatal exception. Will file a separate JIRA to 
fix/discuss this.

> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.7.2, 2.6.2
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-10-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960991#comment-14960991
 ] 

Junping Du commented on YARN-3896:
--

Filed YARN-4274.

> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.7.2, 2.6.2
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709144#comment-14709144
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2229 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2229/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708825#comment-14708825
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8343 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8343/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708845#comment-14708845
 ] 

Jun Gong commented on YARN-3896:


Thanks [~rohithsharma] for the review and commit and [~devaraj.k] for the 
reivew. 

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708925#comment-14708925
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1032 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1032/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708951#comment-14708951
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #299 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/299/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708911#comment-14708911
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #303 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/303/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709091#comment-14709091
 ] 

Hudson commented on YARN-3896:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2248 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2248/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709128#comment-14709128
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #291 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/291/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset synchronously
 -

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
  Labels: resourcemanager
 Fix For: 2.8.0

 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708800#comment-14708800
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 10s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  8s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 52s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  51m 13s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 42s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751826/YARN-3896.07.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b71c600 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708810#comment-14708810
 ] 

Rohith Sharma K S commented on YARN-3896:
-

Test failures are unrelated to the patch.. committing shortly..

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-21 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707720#comment-14707720
 ] 

Jun Gong commented on YARN-3896:


[~rohithsharma] Thanks for the help.  Re-submit your same patch to trigger 
Jenkins.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706319#comment-14706319
 ] 

Rohith Sharma K S commented on YARN-3896:
-

Thanks [~hex108] for the patch, overall patch looks good to me.. Verified the 
tests without source, it is failing every time.. 
nit: Can you add public modifier to the interface api i.e. {{void 
resetLastNodeHeartBeatResponse();}}?

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706464#comment-14706464
 ] 

Rohith Sharma K S commented on YARN-3896:
-

Thanks for the clariffication..

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-21 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706327#comment-14706327
 ] 

Jun Gong commented on YARN-3896:


Thanks [~rohithsharma] for the review.

RMNode is a public interface. checkstyle will report an error 'Redundant public 
modifier' if adding a public modifer to the method.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-04 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653425#comment-14653425
 ] 

Jun Gong commented on YARN-3896:


[~devaraj.k], could you please help review the latest patch?  Thanks.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-23 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638745#comment-14638745
 ] 

Jun Gong commented on YARN-3896:


[~devaraj.k], could you please help review the patch? Thanks. 

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-23 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638944#comment-14638944
 ] 

Jun Gong commented on YARN-3896:


[~devaraj.k], I just attached a new patch that deletes all sleep statements, 
other comments are also addressed in the patch. 

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639100#comment-14639100
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  6s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 52s | Tests passed in 
hadoop-sls. |
| {color:green}+1{color} | yarn tests |  52m  2s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 31s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12746814/YARN-3896.06.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / ee98d63 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8636/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8636/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8636/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8636/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-20 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633469#comment-14633469
 ] 

Devaraj K commented on YARN-3896:
-

Thanks [~hex108] for the updated patch. 

There are some comments about the test.
# Can we have a separate new test for this case instead of adding it with other 
existing test?
# Can you avoid mentioning the JIRA ID in the comment?
   {code:xml}+// Simulate scenario from YARN-3896:{code}
# There are multiple sleep statements with hard coded values in the newly added 
test code. Can you avoid these sleep with hard coded timeouts?
# And also If I try to run the test without source changes, test is failing 
with this message node shouldn't be null. Can we check for REBOOTED state 
here?

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-20 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633640#comment-14633640
 ] 

Jun Gong commented on YARN-3896:


Thanks [~devaraj.k] for the review and comments.

Update a new patch to address your comments. 

{quote}
There are multiple sleep statements with hard coded values in the newly added 
test code. Can you avoid these sleep with hard coded timeouts?
{quote}
The reason for sleep statements: 1. simulate that RM is busying with dealing 
with RMNodeEvent 2.wait until event has been processed. Is it reasonable?

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-20 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633737#comment-14633737
 ] 

Jun Gong commented on YARN-3896:


Failed test cases are not related.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633717#comment-14633717
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  3s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | tools/hadoop tests |   0m 22s | Tests failed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  52m 47s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 41s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.sls.nodemanager.TestNMSimulator |
|   | hadoop.yarn.sls.appmaster.TestAMSimulator |
|   | hadoop.yarn.sls.TestSLSRunner |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12746111/YARN-3896.05.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 98c2bc8 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8585/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8585/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8585/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8585/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-10 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623199#comment-14623199
 ] 

Jun Gong commented on YARN-3896:


Failed test cases are not related, they are addressed in YARN-3909 and 
YARN-3910. Kindly review the patch please.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622334#comment-14622334
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 41s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  4s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 52s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  51m  3s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  91m 32s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744706/YARN-3896.03.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b489080 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8496/artifact/patchprocess/whitespace.txt
 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8496/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8496/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8496/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8496/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622581#comment-14622581
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 49s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 38s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  7s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 51s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  51m  2s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  91m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744729/YARN-3896.04.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b489080 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8497/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8497/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8497/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8497/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch, 
 YARN-3896.03.patch, YARN-3896.04.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-09 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620672#comment-14620672
 ] 

Jun Gong commented on YARN-3896:


[~devaraj.k] , a test case is added in the new patch. Thanks for reviewing.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620839#comment-14620839
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 16s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m  5s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  88m 15s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744513/YARN-3896.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fffb15b |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8481/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8481/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8481/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch, YARN-3896.02.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618407#comment-14618407
 ] 

Devaraj K commented on YARN-3896:
-

Good finding [~hex108].

I think we need to reset the responseId to 0 as part of registerNodeManager 
itself before triggering RMNodeReconnectEvent, instead of handling as part of 
ReconnectNodeTransition.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong

 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618750#comment-14618750
 ] 

Devaraj K commented on YARN-3896:
-

Thanks [~hex108] for delivering the patch quickly.

Can you also add a test to simulate the scenario as part of the patch?


 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618812#comment-14618812
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 13s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  50m 59s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  87m 46s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12744222/YARN-3896.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bd4e109 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8455/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8455/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8455/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3896.01.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)