[jira] [Commented] (YARN-3981) offline collector: support timeline clients not associated with an application

2017-05-11 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006278#comment-16006278
 ] 

Rohith Sharma K S commented on YARN-3981:
-

Thanks [~vrushalic] for skimming through doc. 

bq. Do I understand it correctly that flow collectors will run on each node 
that runs an NM in the cluster?
No. Plan in to start one flow collector service that start only one container 
only. One flow collector service serve for all offline timeline clients. 
However, number of flow collectors are admin configurable. So, there can be N 
collector service. Our TimelineClient will be able to discover all N collector 
service and make use of only one at a time. 

bq. How much traffic do we think might come in? Would it be similar to app 
table writes? If not, is there a possibility we can run this on head node of 
the cluster like where RM or NNs run? Not on the same node as RM but a node 
similar to RM, so that it's "outside" the cluster. We have fairly big sized 
clusters and having each node run a collector may not be optimal.
As of today, traffic is very less compared to app collectors. Let say, when 
ever a HIVE executes a query, this query details are published to atsv2. But we 
can not take a call on traffic which is not guessable. 

bq. aggregation is not relevant I think for a flow collector. Or do we want to 
support it? If not, we don't need to mention it under challenges, it is a non 
issue.
Yep, aggregation is not relevant. I will update doc. Btw, is it possible to 
support aggregation at flow-run level? 

> offline collector: support timeline clients not associated with an application
> --
>
> Key: YARN-3981
> URL: https://issues.apache.org/jira/browse/YARN-3981
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Rohith Sharma K S
>  Labels: YARN-5355
> Attachments: YARN-3981- offline-collector-draft.pdf
>
>
> In the current v.2 design, all timeline writes must belong in a 
> flow/application context (cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an 
> application. One such example is a higher level client (e.g. tez client or 
> hive/oozie/cascading client) writing flow-level data that spans multiple 
> applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3981) offline collector: support timeline clients not associated with an application

2017-05-10 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005775#comment-16005775
 ] 

Vrushali C commented on YARN-3981:
--

Thanks for the design draft Rohith. I think I have some preliminary questions, 
more like discussion. 

- Do I understand it correctly that flow collectors will run on each node that 
runs an NM in the cluster? 
- How much traffic do we think might come in? Would it be similar to app table 
writes? If not, is there a possibility we can run this on head node of the 
cluster like where RM or NNs run? Not on the same node as RM but a node similar 
to RM, so that it's "outside" the cluster. We have fairly big sized clusters 
and having each node run a collector may not be optimal. 
- aggregation is not relevant I think for a flow collector. Or do we want to 
support it? If not, we don't need to mention it under challenges, it is a non 
issue.
- We surely want to think about optimizing connections to hbase

Perhaps I will have more as I think over this further. 

> offline collector: support timeline clients not associated with an application
> --
>
> Key: YARN-3981
> URL: https://issues.apache.org/jira/browse/YARN-3981
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Rohith Sharma K S
>  Labels: YARN-5355
> Attachments: YARN-3981- offline-collector-draft.pdf
>
>
> In the current v.2 design, all timeline writes must belong in a 
> flow/application context (cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an 
> application. One such example is a higher level client (e.g. tez client or 
> hive/oozie/cascading client) writing flow-level data that spans multiple 
> applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org