[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: (was: YARN-3999.5.patch) RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: YARN-3999.5.patch RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: YARN-3999-branch-2.7.patch upload branch-2.7 patch RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999-branch-2.7.patch, YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: YARN-3999.5.patch Fixing tests failures. RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.5.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Summary: RM hangs on draing events (was: Add a timeout when drain the dispatcher) RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. We can add a timeout and stop the dispatcher even if not all events are drained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Description: If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. was:If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. We can add a timeout and stop the dispatcher even if not all events are drained. RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3999) RM hangs on draing events
[ https://issues.apache.org/jira/browse/YARN-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3999: -- Attachment: YARN-3999.4.patch uploaded a new patch. RM hangs on draing events - Key: YARN-3999 URL: https://issues.apache.org/jira/browse/YARN-3999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3999.1.patch, YARN-3999.2.patch, YARN-3999.2.patch, YARN-3999.3.patch, YARN-3999.4.patch, YARN-3999.patch, YARN-3999.patch If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include: 1. add a timeout and stop the dispatcher even if not all events are drained. 2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby. 3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)