[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726498#comment-14726498 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2279 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2279/]) YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) * hadoop-yarn-project/CHANGES.txt > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726377#comment-14726377 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #330 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/330/]) YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) * hadoop-yarn-project/CHANGES.txt > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726371#comment-14726371 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1065 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1065/]) YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) * hadoop-yarn-project/CHANGES.txt > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726373#comment-14726373 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #321 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/321/]) YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) * hadoop-yarn-project/CHANGES.txt > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726376#comment-14726376 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2260 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2260/]) YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) * hadoop-yarn-project/CHANGES.txt > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726378#comment-14726378 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #338 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/338/]) YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) * hadoop-yarn-project/CHANGES.txt > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726216#comment-14726216 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-trunk-Commit #8382 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8382/]) YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) * hadoop-yarn-project/CHANGES.txt > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642813#comment-14642813 ] Sangjin Lee commented on YARN-3222: --- The merge to 2.6.0 is straightforward. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Critical > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347037#comment-14347037 ] Hudson commented on YARN-3222: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347000#comment-14347000 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346934#comment-14346934 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346925#comment-14346925 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346720#comment-14346720 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/856/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346710#comment-14346710 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346104#comment-14346104 ] Hudson commented on YARN-3222: -- FAILURE: Integrated in Hadoop-trunk-Commit #7248 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7248/]) YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Fix For: 2.7.0 > > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346069#comment-14346069 ] Jian He commented on YARN-3222: --- thanks ! committing > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346013#comment-14346013 ] Rohith commented on YARN-3222: -- Had glance at javac and javadoc warning, this looks unrelated to patch > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345987#comment-14345987 ] Hadoop QA commented on YARN-3222: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702276/0005-YARN-3222.patch against trunk revision e17e5ba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1151 javac compiler warnings (more than the trunk's current 185 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6828//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6828//console This message is automatically generated. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345909#comment-14345909 ] Rohith commented on YARN-3222: -- bq. check you added earlier about sending NodeResourceUpdate event only if the node resource is different Agree Updated the patch addressing above comment. Kindly review it. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345513#comment-14345513 ] Jian He commented on YARN-3222: --- thanks Rohith ! I think the condition check you added earlier about sending NodeResourceUpdate event only if the node resource is different is useful, that saves some traffic. would you mind adding that too ? {code} if (rmNode.getState().equals(NodeState.RUNNING)) { // Update scheduler node's capacity for reconnect node. rmNode.context .getDispatcher() .getEventHandler() .handle( new NodeResourceUpdateSchedulerEvent(rmNode, ResourceOption .newInstance(newNode.getTotalCapability(), -1))); } {code} > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344919#comment-14344919 ] Hadoop QA commented on YARN-3222: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702122/0004-YARN-3222.patch against trunk revision 9ae7f9e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6818//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6818//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6818//console This message is automatically generated. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344802#comment-14344802 ] Rohith commented on YARN-3222: -- Kindly review the update patch that fixes 1& 2 in as mentioned in earlier comment. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch, 0004-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344790#comment-14344790 ] Rohith commented on YARN-3222: -- For handling 3rd point, raised issue YARN-3286 > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344739#comment-14344739 ] Rohith commented on YARN-3222: -- Had a mail chat with [~jianhe] regarding the issue's observed in this jira discussions and decided to split up the jira into 2 separate jira. The observed issues in ReconnectNodeTransition are # As per defect description, order of node_resource_update and node_added events sending to schedulers. If Node_added events is being sent to schedulers then no need of sending node_resource_update event from RMNode again to scheduler which is not necessarily required. # If the RMNode state is RUNNING then Node_usable event not necessarily to be sent. # If a node is reconnceted with different capability, then RMNode#totalCapability remains with old capability. This has to be updated with new capability. 1 and 2 are going to handle in this jira. 3 issue will be done in separate jira. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344553#comment-14344553 ] Rohith commented on YARN-3222: -- I thought in the below way, for handling race in the above scenario discussed # if oldNode is same as newNode and change in the capability, then update the resource first in scheduler. ## ClusterResource=5gb+5gb ## Update Resource with new node capability, ClusterResource=5gb+10gb(new capability). # Remove node with new capability ## ClusterResource=15gb-10gb(new capability)=5gb # Add Node with new capability ## ClusterResouce=5gb+10gb=15gb which is expected and {{RMNode#totalCapability}} is 10gb Does it make sense? > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344468#comment-14344468 ] Rohith commented on YARN-3222: -- bq. I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning to send the node_usable event in ReconnectEvent. As you said earlier, the next heartbeat will trigger this event based on the node's own health report. Right.. It is not required. I will remove this bq. The transition is invoked only at running and unhealthy state, so I think this is not possible? I see. bq. Even by sending an event it's still possible that removeNode was removing new capability from cluster resource ? I see a potential risk even if RMNodeResourceUpdateEvent has sent because say Asyndispatcher has events Node_removed,RMNodeResourceUpdate. AsyncDispatcher fetch Node_removed and put it SchedulerEventDispatcher queue. IAC, if SchedulerEventDispatcher is dealyed processing the node_removed may be because of more scheduler events, then RMNodeResourceUpdate is processed first. So there is chance of removing new capability from cluster resource. Any thoughts for handling this issue? > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343896#comment-14343896 ] Jian He commented on YARN-3222: --- bq. I have handled this by sending RMNodeResourceUpdateEvent if there is any change in capability Even by sending an event, it's still possible that removeNode was removing new capability from cluster resource ? > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343846#comment-14343846 ] Jian He commented on YARN-3222: --- thanks for updating. I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning to send the node_usable event in ReconnectEvent. As you said earlier, the next heartbeat will trigger this event based on the node's own health report. bq. It mean, node state can be decommissioned/lost/running The transition is invoked only at running and unhealthy state, so I think this is not possible? > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343106#comment-14343106 ] Hadoop QA commented on YARN-3222: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701861/0003-YARN-3222.patch against trunk revision ca1c00b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6800//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6800//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6800//console This message is automatically generated. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343036#comment-14343036 ] Rohith commented on YARN-3222: -- bq. are the test failures related ? Yes , Since totalCapability was set directly before sending NodeRemovedEvent, removeNode was removing new capability from cluster resource. I have handled this by sending RMNodeResourceUpdateEvent if there is any change in capability bq. we may not need to send the NODE_USABLE event, if the node were already at the running state, right ? yes, done bq. we can make the following two condition checks consistent as checking for RUNNING here check is done for not unhealthy state. It mean, node state can be decommissioned/lost/running. I'd suggest to keep as it is. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, > 0003-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340653#comment-14340653 ] Jian He commented on YARN-3222: --- actually, we may not need to send the NODE_USABLE event, if the node were already at the running state, right ? also, we can make the following two condition checks consistent as checking for RUNNING. {code} if (rmNode.getState() != NodeState.UNHEALTHY) { // Only add new node if old state is not UNHEALTHY if (rmNode.getState().equals(NodeState.RUNNING)) { {code} > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340618#comment-14340618 ] Jian He commented on YARN-3222: --- lgtm, are the test failures related ? > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340041#comment-14340041 ] Hadoop QA commented on YARN-3222: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701317/0002-YARN-3222.patch against trunk revision 48c7ee7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6779//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6779//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6779//console This message is automatically generated. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339972#comment-14339972 ] Rohith commented on YARN-3222: -- Updated the patch for handling following scenarios # Avoid sending Node_resource_Update event to schedulers from RMNode when Node_added event is sent previously # Send NODE_USABLE event if reconnected node is healthy only. # Update resource {{totalCapability}} in RMNode if reconnected node is same as old node > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339799#comment-14339799 ] Rohith commented on YARN-3222: -- bq. NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ? Yes, I think it was assumed like if new node is reconnecting then NM is healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st heartbeat NodeStatus can be moved from Unhealthy to Running. I see another potential issue that if old node is retaining then RMnode has to be updated {{totalCapability}} with new RMNode resource. But in flow, {{totalCapability}} is not updated. This result , scheduler has updated resources value but RMNode has stale memory. Any client getting RMnode capabilit from RMnode would end up in wrong node resource value. {code} if (noRunningApps) { // some code rmNode.context.getDispatcher().getEventHandler().handle( new NodeRemovedSchedulerEvent(rmNode)); if (rmNode.getHttpPort() == newNode.getHttpPort()) { if (rmNode.getState() != NodeState.UNHEALTHY) { // Only add new node if old state is not UNHEALTHY rmNode.context.getDispatcher().getEventHandler().handle( new NodeAddedSchedulerEvent(newNode)); // NEW NODE CAPABILITY SHOULD BE UPDATED TO OLD NODE } } else { // Reconnected node differs, so replace old node and start new node rmNode.context.getDispatcher().getEventHandler().handle( new RMNodeStartedEvent(newNode.getNodeID(), null, null)); // No need to update totalCapability since old node is replaced with new node. } } {code} > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339270#comment-14339270 ] Jian He commented on YARN-3222: --- looks good to me. while looking at this, may found another bug; NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ? {code} rmNode.context.getDispatcher().getEventHandler().handle( new NodesListManagerEvent( NodesListManagerEventType.NODE_USABLE, rmNode)); {code} > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335958#comment-14335958 ] Rohith commented on YARN-3222: -- [~jianhe] kindly review the analysis and patch. I had look at test failures and dont think test failures are not related to this patch. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1406#comment-1406 ] Hadoop QA commented on YARN-3222: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700180/0001-YARN-3222.patch against trunk revision fe7a302. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairReservationSystem The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6698//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6698//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6698//console This message is automatically generated. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333252#comment-14333252 ] Rohith commented on YARN-3222: -- Kindly review the patch, the patch is verified mannually deploying in cluster since tests is not added. In the patch, I have moved handlingRunningApplications to inside of else block. This need not be in common to noAppsRunning. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329136#comment-14329136 ] Rohith commented on YARN-3222: -- I see there are 2 ways of fixing the issue. # Always send NODE_RESOURCE_UPDATE event to scheduler via RMNodeEventType.RESOURCE_UPDATE of RMnode # When NODE_ADDED event is sent to scheduler, again sending NODE_RESOURCE_UPDATE event to same node ReconnectedNodeTransition is duplicate update request because scheduler has already been updated resources with newly added node i.e NODE_REMOVED->NODE_ADDED-->NODE_RESOURCE_UPDATE-->. So if NO applications are running in the node, then it is not required to send node_resource_update request. I would prefer for 2nd option because here one duplicate resource update can be optimized. > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326965#comment-14326965 ] Rohith commented on YARN-3222: -- Attaching the logs which gives more information about issue. In the below log, RM has shutdown with NPE while updating node_resource. And observe scheduler events dispatched from AsyncDispatcher in *org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.\**. Here the order is NODE_REMOVED --> NODE_RESOURCE_UPDATE --> NODE_ADDED --> NODE_LABELS_UPDATE {noformat} 2015-02-19 09:14:57,212 INFO [main] util.RackResolver (RackResolver.java:coreResolve(109)) - Resolved 127.0.0.1 to /default-rack 2015-02-19 09:14:57,213 INFO [main] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(313)) - Reconnect from the node at: 127.0.0.1 2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeReconnectEvent.EventType: RECONNECTED 2015-02-19 09:14:57,215 INFO [main] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(343)) - NodeManager from node 127.0.0.1(cmPort: 1234 httpPort: 3) registered with capability: , assigned nodeId 127.0.0.1:1234 2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type RECONNECTED 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeRemovedSchedulerEvent.EventType: NODE_REMOVED 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStartedEvent.EventType: STARTED 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type STARTED 2015-02-19 09:14:57,266 INFO [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(424)) - 127.0.0.1:1234 Node Transitioned from NEW to RUNNING 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: NODE_USABLE 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeResourceUpdateSchedulerEvent.EventType: NODE_RESOURCE_UPDATE 2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeAddedSchedulerEvent.EventType: NODE_ADDED 2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: NODE_USABLE 2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType: NODE_LABELS_UPDATE 2015-02-19 09:14:57,267 INFO [ResourceManager Event Processor] capacity.CapacityScheduler (CapacityScheduler.java:removeNode(1267)) - Removed node 127.0.0.1:1234 clusterResource: 2015-02-19 09:14:57,267 FATAL [ResourceManager Event Processor] resourcemanager.ResourceManager (ResourceManager.java:run(688)) - Error in handling event type NODE_RESOURCE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:548) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:992) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:679) at java.lang.Thread.run(Thread.java:745) 2015-02-19 09:14:57,280 INFO [ResourceManager Event Processor] resourcemanager.ResourceManager (ResourceManager.java:run(692)) - Exiting, bbye.. {noformat