[ 
https://issues.apache.org/jira/browse/YARN-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652684#comment-16652684
 ] 

Eric Yang commented on YARN-8489:
---------------------------------

{quote}If it is never, dominant field will be ignored. Otherwise dominant field 
is allowed.{quote}

If we go by what you proposed, user expectation of dominant field and restart 
policy will not be right.  Earlier comment was proposing to clean up other 
components, when the dominant component finished.  The dominant component could 
be a batch job that should not be repeated.  Ignore does not sound like the 
right solution here.

Dependent component state changed to FAILED to signal other components to 
terminate seems like a more intuitive approach to address the state transition 
problem.  This will ensure restart policy or upgrade trigged state change 
requires no addition insertion of logic to safe guard dominant component.

{quote}
Component.state: 
- Transition to SUCCEEDED && component.dominant == true: Set service state to 
SUCCEEDED. 
- Transition to FAILED && component.dominant == true. Set service state to 
FAILED. 
{quote}

This looks like you want the service to report successful state or failure 
state based on the "important" component status instead of every component 
report SUCCEEDED to get service state SUCCEEDED.  A safer approach to enable 
this logic is to have a boolean flag in component level to indicate 
"report_as_service_state":true.  This requires no alteration to state 
transition logic, but add a check in the end.

> Need to support "dominant" component concept inside YARN service
> ----------------------------------------------------------------
>
>                 Key: YARN-8489
>                 URL: https://issues.apache.org/jira/browse/YARN-8489
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: yarn-native-services
>            Reporter: Wangda Tan
>            Priority: Major
>
> Existing YARN service support termination policy for different restart 
> policies. For example ALWAYS means service will not be terminated. And NEVER 
> means if all component terminated, service will be terminated.
> The name "dominant" might not be most appropriate , we can figure out better 
> names. But in simple, it means, a dominant component which final state will 
> determine job's final state regardless of other components.
> Use cases: 
> 1) Tensorflow job has master/worker/services/tensorboard. Once master goes to 
> final state, no matter if it is succeeded or failed, we should terminate 
> ps/tensorboard/workers. And the mark the job to succeeded/failed. 
> 2) Not sure if it is a real-world use case: A service which has multiple 
> component, some component is not restartable. For such services, if a 
> component is failed, we should mark the whole service to failed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to