[ 
https://issues.apache.org/jira/browse/YARN-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C reassigned YARN-5357:
--------------------------------

    Assignee: Prabha Manepalli  (was: Varun Saxena)

> Timeline service v2 integration with Federation 
> ------------------------------------------------
>
>                 Key: YARN-5357
>                 URL: https://issues.apache.org/jira/browse/YARN-5357
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Prabha Manepalli
>            Priority: Major
>
> Jira to note the discussion points from an initial chat about integrating 
> Timeline Service v2 with Federation (YARN-2915).
> cc [~subru] [~curino] 
> For Federation:
> - all entities that belong to the same flow run should have the same cluster 
> name
> - app id in the same flow run strongly ordered in time
> - need a logical cluster name and physical cluster name
> - a possibility to implement the Application TimelineCollector as an 
> interceptor in the AMRMProxyService.
> For Timeline Service:
> - need to store physical cluster id and logical cluster id so that we don't 
> lose information at any level (flow/app/entity etc)
> - add a  new table app id to cluster mapping table
> - need a different entity table/some table to store node level metrics for 
> physical cluster stats. Once we get to node-level rollup, we probably have to 
> store something in a dc, cluster, rack, node hierarchy. In that case a 
> physical cluster makes sense, but we'd still need some way to tie physical 
> and logical together in order to make automatic error detection etc that 
> we're envisioning feasible within a federated setup.
> For the Cluster Naming convention:
> - three situations for cluster name:
> ----> app submitted to router should take federated (aka logical) cluster name
> ----> app submitted directly to RM should take physical cluster name
> ----> Info about the physical cluster  in entities?
> - suggestion to set the cluster name as yarn tag at the router level (in the 
> app submission context) 
> Other points to note:
> - for federation to work smoothly in environments that use HDFS some 
> additional considerations are needed, and possibly some solution like what is 
> being used at Twitter with the nFly approach.
> Email thread context:
> {code}
> ---------- Forwarded message ----------
> From: Joep Rottinghuis 
> Date: Fri, Jul 8, 2016 at 1:22 PM
> Subject: Re: Federation -Timeline Service meeting notes
> To: Subramaniam Venkatraman Krishnan 
> Cc: Sangjin Lee, Vrushali Channapattan , Carlo Curino
> Thanks for the notes.
> I think that for federation to work smoothly in environments that use HDFS 
> some additional considerations are needed, and possibly some solution like 
> what we're using at Twitter with our nFly approach.
> bq. - need a different entity table/some table to store node level metrics 
> for physical cluster stats
> Once we get to node-level rollup, we probably have to store something in a 
> dc, cluster, rack, node hierarchy. In that case a physical cluster makes 
> sense, but we'd still need some way to tie physical and logical together in 
> order to make automatic error detection etc that we're envisioning feasible 
> within a federated setup.
> Cheers,
> Joep
> On Fri, Jul 8, 2016 at 1:00 PM, Subramaniam Venkatraman Krishnan  wrote:
>     Thanks Vrushali for crisply capturing the essential from our rambling 
> discussion J.
>      
>     Sangjin, I just want to add one comment to yours – we want to retain the 
> physical cluster name (possibly as a new entity type) so that we don’t lose 
> information & we can cluster level rollups even if they are not efficient.
>      
>     Additionally, based on the walkthrough of Federation design:
>     ·         There was general agreement with the proposed approach.
>     ·         There is a possibility to implement the Application 
> TimelineCollector as an interceptor in the AMRMProxyService.
>     ·         Joep raised the concern that it would be better if the RMs 
> obtain the epoch from FederationStateStore. This is not currently in the 
> roadmap of our MVP but we definitely plan to address this in future.
>      
>     Regards,
>     Subru
>      
>     From: Sangjin Lee
>     Sent: Thursday, July 07, 2016 6:22 PM
>     To: Vrushali Channapattan 
>     Cc: Joep Rottinghuis; Carlo Curino; Subramaniam Venkatraman Krishnan 
>     Subject: Re: Federation -Timeline Service meeting notes
>      
>     Thanks for the summary Vrushali!
>      
>     Just so that we're on the same page regarding the terminology, I 
> understand we're using the terms "logical cluster" and "federated cluster" 
> interchangeably.
>      
>     Also, between using the federated cluster name and the home cluster name 
> as a solution, I think we were leaning towards the federated cluster name 
> (although not concluded).
>      
>     On Thu, Jul 7, 2016 at 4:33 PM, Vrushali Channapattan wrote:
>          
>         For Federation:
>         - all entities that belong to the same flow run should have the same 
> cluster name
>         - app id in the same flow run strongly ordered in time
>         - need a logical cluster name and physical cluster name
>         For Timeline Service:
>         - need to store physical cluster id and logical cluster id so that we 
> don't lose information at any level (flow/app/entity etc)
>         - add a  new table app id to cluster mapping table
>         - need a different entity table/some table to store node level 
> metrics for physical cluster stats
>         For the Cluster Naming convention:
>         - three situations for cluster name:
>         ----> app submitted to router should take federated cluster name
>         ----> app submitted directly to RM should take physical cluster name
>         ----> Info about the physical cluster  in entities?
>         - suggestion to set the cluster name as yarn tag at the router level 
> (in the app submission context)
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to