[
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006278#comment-16006278
]
Rohith Sharma K S commented on YARN-3981:
-
Thanks [~vrushalic] for skimming through doc.
bq. Do I understand it correctly that flow collectors will run on each node
that runs an NM in the cluster?
No. Plan in to start one flow collector service that start only one container
only. One flow collector service serve for all offline timeline clients.
However, number of flow collectors are admin configurable. So, there can be N
collector service. Our TimelineClient will be able to discover all N collector
service and make use of only one at a time.
bq. How much traffic do we think might come in? Would it be similar to app
table writes? If not, is there a possibility we can run this on head node of
the cluster like where RM or NNs run? Not on the same node as RM but a node
similar to RM, so that it's "outside" the cluster. We have fairly big sized
clusters and having each node run a collector may not be optimal.
As of today, traffic is very less compared to app collectors. Let say, when
ever a HIVE executes a query, this query details are published to atsv2. But we
can not take a call on traffic which is not guessable.
bq. aggregation is not relevant I think for a flow collector. Or do we want to
support it? If not, we don't need to mention it under challenges, it is a non
issue.
Yep, aggregation is not relevant. I will update doc. Btw, is it possible to
support aggregation at flow-run level?
> offline collector: support timeline clients not associated with an application
> --
>
> Key: YARN-3981
> URL: https://issues.apache.org/jira/browse/YARN-3981
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Rohith Sharma K S
> Labels: YARN-5355
> Attachments: YARN-3981- offline-collector-draft.pdf
>
>
> In the current v.2 design, all timeline writes must belong in a
> flow/application context (cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an
> application. One such example is a higher level client (e.g. tez client or
> hive/oozie/cascading client) writing flow-level data that spans multiple
> applications. We need to find a way to support them.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org