[
https://issues.apache.org/jira/browse/YARN-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vrushali C reassigned YARN-5357:
--------------------------------
Assignee: Prabha Manepalli (was: Varun Saxena)
> Timeline service v2 integration with Federation
> ------------------------------------------------
>
> Key: YARN-5357
> URL: https://issues.apache.org/jira/browse/YARN-5357
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Vrushali C
> Assignee: Prabha Manepalli
> Priority: Major
>
> Jira to note the discussion points from an initial chat about integrating
> Timeline Service v2 with Federation (YARN-2915).
> cc [~subru] [~curino]
> For Federation:
> - all entities that belong to the same flow run should have the same cluster
> name
> - app id in the same flow run strongly ordered in time
> - need a logical cluster name and physical cluster name
> - a possibility to implement the Application TimelineCollector as an
> interceptor in the AMRMProxyService.
> For Timeline Service:
> - need to store physical cluster id and logical cluster id so that we don't
> lose information at any level (flow/app/entity etc)
> - add a new table app id to cluster mapping table
> - need a different entity table/some table to store node level metrics for
> physical cluster stats. Once we get to node-level rollup, we probably have to
> store something in a dc, cluster, rack, node hierarchy. In that case a
> physical cluster makes sense, but we'd still need some way to tie physical
> and logical together in order to make automatic error detection etc that
> we're envisioning feasible within a federated setup.
> For the Cluster Naming convention:
> - three situations for cluster name:
> ----> app submitted to router should take federated (aka logical) cluster name
> ----> app submitted directly to RM should take physical cluster name
> ----> Info about the physical cluster in entities?
> - suggestion to set the cluster name as yarn tag at the router level (in the
> app submission context)
> Other points to note:
> - for federation to work smoothly in environments that use HDFS some
> additional considerations are needed, and possibly some solution like what is
> being used at Twitter with the nFly approach.
> Email thread context:
> {code}
> ---------- Forwarded message ----------
> From: Joep Rottinghuis
> Date: Fri, Jul 8, 2016 at 1:22 PM
> Subject: Re: Federation -Timeline Service meeting notes
> To: Subramaniam Venkatraman Krishnan
> Cc: Sangjin Lee, Vrushali Channapattan , Carlo Curino
> Thanks for the notes.
> I think that for federation to work smoothly in environments that use HDFS
> some additional considerations are needed, and possibly some solution like
> what we're using at Twitter with our nFly approach.
> bq. - need a different entity table/some table to store node level metrics
> for physical cluster stats
> Once we get to node-level rollup, we probably have to store something in a
> dc, cluster, rack, node hierarchy. In that case a physical cluster makes
> sense, but we'd still need some way to tie physical and logical together in
> order to make automatic error detection etc that we're envisioning feasible
> within a federated setup.
> Cheers,
> Joep
> On Fri, Jul 8, 2016 at 1:00 PM, Subramaniam Venkatraman Krishnan wrote:
> Thanks Vrushali for crisply capturing the essential from our rambling
> discussion J.
>
> Sangjin, I just want to add one comment to yours – we want to retain the
> physical cluster name (possibly as a new entity type) so that we don’t lose
> information & we can cluster level rollups even if they are not efficient.
>
> Additionally, based on the walkthrough of Federation design:
> · There was general agreement with the proposed approach.
> · There is a possibility to implement the Application
> TimelineCollector as an interceptor in the AMRMProxyService.
> · Joep raised the concern that it would be better if the RMs
> obtain the epoch from FederationStateStore. This is not currently in the
> roadmap of our MVP but we definitely plan to address this in future.
>
> Regards,
> Subru
>
> From: Sangjin Lee
> Sent: Thursday, July 07, 2016 6:22 PM
> To: Vrushali Channapattan
> Cc: Joep Rottinghuis; Carlo Curino; Subramaniam Venkatraman Krishnan
> Subject: Re: Federation -Timeline Service meeting notes
>
> Thanks for the summary Vrushali!
>
> Just so that we're on the same page regarding the terminology, I
> understand we're using the terms "logical cluster" and "federated cluster"
> interchangeably.
>
> Also, between using the federated cluster name and the home cluster name
> as a solution, I think we were leaning towards the federated cluster name
> (although not concluded).
>
> On Thu, Jul 7, 2016 at 4:33 PM, Vrushali Channapattan wrote:
>
> For Federation:
> - all entities that belong to the same flow run should have the same
> cluster name
> - app id in the same flow run strongly ordered in time
> - need a logical cluster name and physical cluster name
> For Timeline Service:
> - need to store physical cluster id and logical cluster id so that we
> don't lose information at any level (flow/app/entity etc)
> - add a new table app id to cluster mapping table
> - need a different entity table/some table to store node level
> metrics for physical cluster stats
> For the Cluster Naming convention:
> - three situations for cluster name:
> ----> app submitted to router should take federated cluster name
> ----> app submitted directly to RM should take physical cluster name
> ----> Info about the physical cluster in entities?
> - suggestion to set the cluster name as yarn tag at the router level
> (in the app submission context)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]