[
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007242#comment-15007242
]
Sangjin Lee commented on YARN-4183:
-----------------------------------
I agree we probably shouldn't put too many points of discussion here that may
not be core to this JIRA at hand. I'd like to focus on the
SystemMetricsPublisher and
yarn.resourcemanager.system-metrics-publisher.enabled and
yarn.timeline-service.enabled.
bq. as far as 2.7.2 is concerned i feel
yarn.resourcemanager.system-metrics-publisher.enabled is sufficient to be
configured.
I'm not sure if that is desirable. Here is a key question. Suppose the timeline
service is disabled, and no timeline daemons are running. And suppose
yarn.resourcemanager.system-metrics-publisher.enabled is *true*, and we changed
SystemMetricsPublisher to check only that flag. What would happen? AFAICT, the
SystemMetricsPublisher will fire up the timeline client, and will try to send
all the events actively to the timeline server. But since the timeline server
is down, it will lead to continuous failures of writing to the timeline server,
right? IMO, this type of very late failures is deeply unsatisfying and
problematic.
If the answer is "yarn.resourcemanager.system-metrics-publisher.enabled should
not be set to true if the timeline service is disabled", then it only makes it
clear that yarn.resourcemanager.system-metrics-publisher.enabled=true implies
yarn.timeline-service.enabled=true. Then we should check it explicitly.
Thoughts?
bq. As far as i view it "yarn.timeline-service.enabled"* name is misleading, it
should be more to signify client requires the timeline service's delegation
token. Which will not be a server side config. Thoughts?
I'm not sure if that's how it's currently interpreted, but the way I view it is
that it should act as a "master switch" for the timeline service; i.e. the
highest level switch that toggles the feature on and off on all sides. There
can be "sub-switches" that can control finer-grained parts of the feature (e.g.
the system metrics publisher). But those subfeatures should always check the
master switch before checking their own. This will lead to a clean and
consistent pattern of using the feature everywhere.
Also, consider the fact that the system metrics publisher may not be the only
server-side component that interacts with the timeline service. There may be
others and there will be more with the timeline service v.2 (e.g. NM collector
service, etc.). If they all handle the failure case of the timeline server not
being up in their own way, it would be quite confusing and error-prone. It
would be consistent and easy to handle if everyone checks the master switch
(and possibly their own subfeature switch), and wires off the feature as early
as possible. So I would argue that yarn.timeline-service.enabled should be
interpreted as such a "master switch", both for server-side and client-side.
I'd like to hear your thoughts. Thanks!
> Enabling generic application history forces every job to get a timeline
> service delegation token
> ------------------------------------------------------------------------------------------------
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.7.1
> Reporter: Mit Desai
> Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server,
> the system metrics publisher will not publish the events to the timeline
> store as it checks if the timeline server and system metrics publisher are
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if
> application history server is enabled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)