[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8012:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2020-04-10 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-8012:
---
Target Version/s: 3.4.0  (was: 3.3.0)

Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-11-16 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8012:
-
Target Version/s: 3.3.0  (was: 2.7.1, 3.2.0)

Bulk update: moved all 3.2.0 non-blocker issues, please move back if it is a 
blocker.

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8012:
-
Target Version/s: 2.7.1, 3.2.0  (was: 2.7.1, 3.0.0)

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-8012:
-
Target Version/s: 3.0.0, 2.7.1  (was: 2.7.1, 3.0.0)
   Fix Version/s: (was: 2.7.1)

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-20 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Attachment: YARN-8012 - Unmanaged Container Cleanup.pdf

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012 - Unmanaged Container Cleanup.pdf, 
> YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node to free up 
resource for other urgent computations on the node.
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node to free up 
resource for other native processes on the node.
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other urgent computations on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node to free up 
resource for other native processes on the node.
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node to free up 
> resource for other native processes on the node.
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Note, they are caused or things become worse if work-preserving NM restart 
enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Note, they are caused or things become worse if work-preserving NM restart 
> enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see YARN-1336

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN on the node:
 ** Cause YARN on the node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for App user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see 
[YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see YARN-1336
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN on the node:
>  ** Cause YARN on the node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for App user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container / leaked container* is a container which is no longer 
managed by NM. Thus, it is cannot be managed / leaked by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 * NM service is disabled or removed on the node.
 * NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 * NM local leveldb store is corrupted or lost, such as bad disk sectors.
 * NM has bugs, such as wrongly mark live container as complete.

Things become worse if work-preserving NM restart enabled, see 
[YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container / leaked container* is a container which is no longer 
> managed by NM. Thus, it is cannot be managed / leaked by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  * NM service is disabled or removed on the node.
>  * NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  * NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  * NM has bugs, such as wrongly mark live container as complete.
> Things become worse if work-preserving NM restart enabled, see 
> [YARN-1336|https://issues.apache.org/jira/browse/YARN-1336]
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN and the node:
>  ** Cause YARN and node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Description: 
An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

  was:
An *unmanaged container* is a container which is no longer managed by NM. Thus, 
it is cannot be managed by YARN, too.

*There are many cases a YARN managed container can become unmanaged, such as:*
 # For container resource managed by YARN, such as container job object
 and disk data:
 ** NM service is disabled or removed on the node.
 ** NM is unable to start up again on the node, such as depended configuration, 
or resources cannot be ready.
 ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
 ** NM has bugs, such as wrongly mark live container as complete.
 #  For container resource unmanaged by YARN:
 ** User breakaway processes from container job object.
 ** User creates VMs from container job object.
 ** User acquires other resource on the machine which is unmanaged by
 YARN, such as produce data outside Container folder.

*Bad impacts of unmanaged container, such as:*
 # Resource cannot be managed for YARN and the node:
 ** Cause YARN and node resource leak
 ** Cannot kill the container to release YARN resource on the node
 # Container and App killing is not eventually consistent for user:
 ** App which has bugs can still produce bad impacts to outside even if the App 
is killed for a long time

*Initial patch for review:*

For the initial patch, the unmanaged container cleanup feature on Windows, only 
can cleanup the container job object of the unmanaged container. Cleanup for 
more container resources will be supported. And the UT will be added if the 
design is agreed.

The current container will be considered as unmanaged when:
 # NM is dead:
 ** Failed to check whether container is managed by NM within timeout.
 # NM is alive but container is
 org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
 or not found:
 ** The container is org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE 
or
 not found in the NM container list.


> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container* is a container which is no longer managed by NM. 
> Thus, it is cannot be managed by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  # For container resource managed by YARN, such as container job object
>  and disk data:
>  ** NM service is disabled or removed on the node.
>  ** NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  ** NM has bugs, such as wrongly mark live container as complete.
>  #  For container resource unmanaged by YARN:
>  ** User breakaway processes from container job object.
>  ** User creates VMs from container job object.
>  ** User acquires other resource on the machine which is unmanaged by
>  YARN, such as produce data outside Container folder.
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN and the node:
>  ** Cause YARN and node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killin

[jira] [Updated] (YARN-8012) Support Unmanaged Container Cleanup

2018-03-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-8012:

Attachment: YARN-8012-branch-2.7.1.001.patch

> Support Unmanaged Container Cleanup
> ---
>
> Key: YARN-8012
> URL: https://issues.apache.org/jira/browse/YARN-8012
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Major
> Fix For: 2.7.1
>
> Attachments: YARN-8012-branch-2.7.1.001.patch
>
>
> An *unmanaged container* is a container which is no longer managed by NM. 
> Thus, it is cannot be managed by YARN, too.
> *There are many cases a YARN managed container can become unmanaged, such as:*
>  # For container resource managed by YARN, such as container job object
>  and disk data:
>  ** NM service is disabled or removed on the node.
>  ** NM is unable to start up again on the node, such as depended 
> configuration, or resources cannot be ready.
>  ** NM local leveldb store is corrupted or lost, such as bad disk sectors.
>  ** NM has bugs, such as wrongly mark live container as complete.
>  #  For container resource unmanaged by YARN:
>  ** User breakaway processes from container job object.
>  ** User creates VMs from container job object.
>  ** User acquires other resource on the machine which is unmanaged by
>  YARN, such as produce data outside Container folder.
> *Bad impacts of unmanaged container, such as:*
>  # Resource cannot be managed for YARN and the node:
>  ** Cause YARN and node resource leak
>  ** Cannot kill the container to release YARN resource on the node
>  # Container and App killing is not eventually consistent for user:
>  ** App which has bugs can still produce bad impacts to outside even if the 
> App is killed for a long time
> *Initial patch for review:*
> For the initial patch, the unmanaged container cleanup feature on Windows, 
> only can cleanup the container job object of the unmanaged container. Cleanup 
> for more container resources will be supported. And the UT will be added if 
> the design is agreed.
> The current container will be considered as unmanaged when:
>  # NM is dead:
>  ** Failed to check whether container is managed by NM within timeout.
>  # NM is alive but container is
>  org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE
>  or not found:
>  ** The container is 
> org.apache.hadoop.yarn.api.records.ContainerState#COMPLETE or
>  not found in the NM container list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org