[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007242#comment-15007242
 ] 

Sangjin Lee commented on YARN-4183:
-----------------------------------

I agree we probably shouldn't put too many points of discussion here that may 
not be core to this JIRA at hand. I'd like to focus on the 
SystemMetricsPublisher and 
yarn.resourcemanager.system-metrics-publisher.enabled and 
yarn.timeline-service.enabled.

bq. as far as 2.7.2 is concerned i feel 
yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be 
configured.

I'm not sure if that is desirable. Here is a key question. Suppose the timeline 
service is disabled, and no timeline daemons are running. And suppose 
yarn.resourcemanager.system-metrics-publisher.enabled is *true*, and we changed 
SystemMetricsPublisher to check only that flag. What would happen? AFAICT, the 
SystemMetricsPublisher will fire up the timeline client, and will try to send 
all the events actively to the timeline server. But since the timeline server 
is down, it will lead to continuous failures of writing to the timeline server, 
right? IMO, this type of very late failures is deeply unsatisfying and 
problematic.

If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled should 
not be set to true if the timeline service is disabled", then it only makes it 
clear that yarn.resourcemanager.system-metrics-publisher.enabled=true implies 
yarn.timeline-service.enabled=true. Then we should check it explicitly. 
Thoughts?

bq. As far as i view it "yarn.timeline-service.enabled"* name is misleading, it 
should be more to signify client requires the timeline service's delegation 
token. Which will not be a server side config. Thoughts?

I'm not sure if that's how it's currently interpreted, but the way I view it is 
that it should act as a "master switch" for the timeline service; i.e. the 
highest level switch that toggles the feature on and off on all sides. There 
can be "sub-switches" that can control finer-grained parts of the feature (e.g. 
the system metrics publisher). But those subfeatures should always check the 
master switch before checking their own. This will lead to a clean and 
consistent pattern of using the feature everywhere.

Also, consider the fact that the system metrics publisher may not be the only 
server-side component that interacts with the timeline service. There may be 
others and there will be more with the timeline service v.2 (e.g. NM collector 
service, etc.). If they all handle the failure case of the timeline server not 
being up in their own way, it would be quite confusing and error-prone. It 
would be consistent and easy to handle if everyone checks the master switch 
(and possibly their own subfeature switch), and wires off the feature as early 
as possible. So I would argue that yarn.timeline-service.enabled should be 
interpreted as such a "master switch", both for server-side and client-side.

I'd like to hear your thoughts. Thanks!

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4183
>                 URL: https://issues.apache.org/jira/browse/YARN-4183
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>         Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to