[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-07-20 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102.07.patch

attached the patch fixing java doc errors. 

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, 
> YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, 
> YARN-6102.06.patch, YARN-6102.07.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-07-20 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102.06.patch

Updating a patch adding more java doc as per Sunil's offline comments. 

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, 
> YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, YARN-6102.06.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-07-12 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102.05.patch

patch for fixing findbug

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, 
> YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-07-12 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102.04.patch

updated the patch fixing review comments.
# renamed globalServiceContext to serviceContext. 
# grouped setters/getters of serviceContext and active service context 
separately.

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, 
> YARN-6102.03.patch, YARN-6102.04.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-07-11 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102.03.patch

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, 
> YARN-6102.03.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-07-11 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102.02.patch

updated the patch adding test case and some minor refactoring of RMContext. 

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-07-10 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102.01.patch

updated the proposed 2nd solution that modifies YARN code. 

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-03-27 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6102:
-
Target Version/s: 2.8.1  (was: 2.8.0)

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
> Attachments: eventOrder.JPG
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-6102:
---
Summary: On failover RM can crash due to unregistered event to 
AsyncDispatcher  (was: RM can crash due to unregistered event to 
AsyncDispatcher on transitionToStandby)

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
> Attachments: eventOrder.JPG
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6102:
--
Attachment: eventOrder.JPG

attaching event order for reference

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
> Attachments: eventOrder.JPG
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-6102:

Target Version/s: 2.8.0

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) On failover RM can crash due to unregistered event to AsyncDispatcher

2017-01-17 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-6102:

Affects Version/s: 2.8.0
   2.7.3

> On failover RM can crash due to unregistered event to AsyncDispatcher
> -
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Critical
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org