[jira] [Updated] (YARN-3999) RM hangs on draing events

2015-08-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: (was: YARN-3999.5.patch)

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draing events

2015-08-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: YARN-3999.5.patch

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, 
 YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draing events

2015-08-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: YARN-3999-branch-2.7.patch

upload branch-2.7 patch

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, 
 YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, 
 YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draing events

2015-08-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: YARN-3999.5.patch

Fixing tests failures.

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, 
 YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draing events

2015-08-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Summary: RM hangs on draing events  (was: Add a timeout when drain the 
dispatcher)

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. We can add a timeout and stop the dispatcher even 
 if not all events are drained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draing events

2015-08-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Description: 
If external systems like ATS, or ZK becomes very slow, draining all the events 
take a lot of time. If this time becomes larger than 10 mins, all applications 
will expire. Fixes include:
1. add a timeout and stop the dispatcher even if not all events are drained.
2. Move ATS service out from RM active service so that RM doesn't need to wait 
for ATS to flush the events when transitioning to standby.
3. Stop client-facing services (ClientRMService etc.) first so that clients get 
fast notification that RM is stopping/transitioning.

  was:If external systems like ATS, or ZK becomes very slow, draining all the 
events take a lot of time. If this time becomes larger than 10 mins, all 
applications will expire. We can add a timeout and stop the dispatcher even if 
not all events are drained.


 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3999) RM hangs on draing events

2015-08-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3999:
--
Attachment: YARN-3999.4.patch

uploaded a new patch.

 RM hangs on draing events
 -

 Key: YARN-3999
 URL: https://issues.apache.org/jira/browse/YARN-3999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, 
 YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.patch, YARN-3999.patch


 If external systems like ATS, or ZK becomes very slow, draining all the 
 events take a lot of time. If this time becomes larger than 10 mins, all 
 applications will expire. Fixes include:
 1. add a timeout and stop the dispatcher even if not all events are drained.
 2. Move ATS service out from RM active service so that RM doesn't need to 
 wait for ATS to flush the events when transitioning to standby.
 3. Stop client-facing services (ClientRMService etc.) first so that clients 
 get fast notification that RM is stopping/transitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)