[ 
https://issues.apache.org/jira/browse/YARN-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8489:
-----------------------------
    Description: 
Existing YARN service support termination policy for different restart 
policies. For example ALWAYS means service will not be terminated. And NEVER 
means if all component terminated, service will be terminated.

The name "dominant" might not be most appropriate , we can figure out better 
names. But in simple, it means, a dominant component which final state will 
determine job's final state regardless of other components.

Use cases: 

1) Tensorflow job has master/worker/services/tensorboard. Once master goes to 
final state, no matter if it is succeeded or failed, we should terminate 
ps/tensorboard/workers. And the mark the job to succeeded/failed. 
2) Not sure if it is a real-world use case: A service which has multiple 
component, some component is not restartable. For such services, if a component 
is failed, we should mark the whole service to failed. 

  was:
Existing YARN service support termination policy for different restart 
policies. For example ALWAYS means service will not be terminated. And NEVER 
means if all component terminated, service will be terminated.

There're some jobs/services need different policy. For example, if Tensorflow 
master component terminated (regardless of succeed or finished), we need to 
terminate whole training job regardless or other states of other components.

The name "dominant" might not be most appropriate , we can figure out better 
names. But in simple, it means, a dominant component which final state will 
determine job's final state regardless of other components.


> Need to support "dominant" component concept inside YARN service
> ----------------------------------------------------------------
>
>                 Key: YARN-8489
>                 URL: https://issues.apache.org/jira/browse/YARN-8489
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: yarn-native-services
>            Reporter: Wangda Tan
>            Priority: Major
>
> Existing YARN service support termination policy for different restart 
> policies. For example ALWAYS means service will not be terminated. And NEVER 
> means if all component terminated, service will be terminated.
> The name "dominant" might not be most appropriate , we can figure out better 
> names. But in simple, it means, a dominant component which final state will 
> determine job's final state regardless of other components.
> Use cases: 
> 1) Tensorflow job has master/worker/services/tensorboard. Once master goes to 
> final state, no matter if it is succeeded or failed, we should terminate 
> ps/tensorboard/workers. And the mark the job to succeeded/failed. 
> 2) Not sure if it is a real-world use case: A service which has multiple 
> component, some component is not restartable. For such services, if a 
> component is failed, we should mark the whole service to failed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to