[jira] [Updated] (YARN-5357) Timeline service v2 integration with Federation

2019-08-30 Thread Rohith Sharma K S (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-5357:

Parent: YARN-9802  (was: YARN-7055)

> Timeline service v2 integration with Federation 
> 
>
> Key: YARN-5357
> URL: https://issues.apache.org/jira/browse/YARN-5357
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
>
> Jira to note the discussion points from an initial chat about integrating 
> Timeline Service v2 with Federation (YARN-2915).
> cc [~subru] [~curino] 
> For Federation:
> - all entities that belong to the same flow run should have the same cluster 
> name
> - app id in the same flow run strongly ordered in time
> - need a logical cluster name and physical cluster name
> - a possibility to implement the Application TimelineCollector as an 
> interceptor in the AMRMProxyService.
> For Timeline Service:
> - need to store physical cluster id and logical cluster id so that we don't 
> lose information at any level (flow/app/entity etc)
> - add a  new table app id to cluster mapping table
> - need a different entity table/some table to store node level metrics for 
> physical cluster stats. Once we get to node-level rollup, we probably have to 
> store something in a dc, cluster, rack, node hierarchy. In that case a 
> physical cluster makes sense, but we'd still need some way to tie physical 
> and logical together in order to make automatic error detection etc that 
> we're envisioning feasible within a federated setup.
> For the Cluster Naming convention:
> - three situations for cluster name:
> > app submitted to router should take federated (aka logical) cluster name
> > app submitted directly to RM should take physical cluster name
> > Info about the physical cluster  in entities?
> - suggestion to set the cluster name as yarn tag at the router level (in the 
> app submission context) 
> Other points to note:
> - for federation to work smoothly in environments that use HDFS some 
> additional considerations are needed, and possibly some solution like what is 
> being used at Twitter with the nFly approach.
> Email thread context:
> {code}
> -- Forwarded message --
> From: Joep Rottinghuis 
> Date: Fri, Jul 8, 2016 at 1:22 PM
> Subject: Re: Federation -Timeline Service meeting notes
> To: Subramaniam Venkatraman Krishnan 
> Cc: Sangjin Lee, Vrushali Channapattan , Carlo Curino
> Thanks for the notes.
> I think that for federation to work smoothly in environments that use HDFS 
> some additional considerations are needed, and possibly some solution like 
> what we're using at Twitter with our nFly approach.
> bq. - need a different entity table/some table to store node level metrics 
> for physical cluster stats
> Once we get to node-level rollup, we probably have to store something in a 
> dc, cluster, rack, node hierarchy. In that case a physical cluster makes 
> sense, but we'd still need some way to tie physical and logical together in 
> order to make automatic error detection etc that we're envisioning feasible 
> within a federated setup.
> Cheers,
> Joep
> On Fri, Jul 8, 2016 at 1:00 PM, Subramaniam Venkatraman Krishnan  wrote:
> Thanks Vrushali for crisply capturing the essential from our rambling 
> discussion J.
>  
> Sangjin, I just want to add one comment to yours – we want to retain the 
> physical cluster name (possibly as a new entity type) so that we don’t lose 
> information & we can cluster level rollups even if they are not efficient.
>  
> Additionally, based on the walkthrough of Federation design:
> · There was general agreement with the proposed approach.
> · There is a possibility to implement the Application 
> TimelineCollector as an interceptor in the AMRMProxyService.
> · Joep raised the concern that it would be better if the RMs 
> obtain the epoch from FederationStateStore. This is not currently in the 
> roadmap of our MVP but we definitely plan to address this in future.
>  
> Regards,
> Subru
>  
> From: Sangjin Lee
> Sent: Thursday, July 07, 2016 6:22 PM
> To: Vrushali Channapattan 
> Cc: Joep Rottinghuis; Carlo Curino; Subramaniam Venkatraman Krishnan 
> Subject: Re: Federation -Timeline Service meeting notes
>  
> Thanks for the summary Vrushali!
>  
> Just so that we're on the same page regarding the terminology, I 
> understand we're using the terms "logical cluster" and "federated cluster" 
> interchangeably.
>  
> Also, between using the federated cluster name and the home cluster name 
> as a solution, I think we were leaning towards the federated cluster name 
> (although not 

[jira] [Updated] (YARN-5357) Timeline service v2 integration with Federation

2017-08-18 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-5357:
-
Parent Issue: YARN-7055  (was: YARN-5355)

> Timeline service v2 integration with Federation 
> 
>
> Key: YARN-5357
> URL: https://issues.apache.org/jira/browse/YARN-5357
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
>
> Jira to note the discussion points from an initial chat about integrating 
> Timeline Service v2 with Federation (YARN-2915).
> cc [~subru] [~curino] 
> For Federation:
> - all entities that belong to the same flow run should have the same cluster 
> name
> - app id in the same flow run strongly ordered in time
> - need a logical cluster name and physical cluster name
> - a possibility to implement the Application TimelineCollector as an 
> interceptor in the AMRMProxyService.
> For Timeline Service:
> - need to store physical cluster id and logical cluster id so that we don't 
> lose information at any level (flow/app/entity etc)
> - add a  new table app id to cluster mapping table
> - need a different entity table/some table to store node level metrics for 
> physical cluster stats. Once we get to node-level rollup, we probably have to 
> store something in a dc, cluster, rack, node hierarchy. In that case a 
> physical cluster makes sense, but we'd still need some way to tie physical 
> and logical together in order to make automatic error detection etc that 
> we're envisioning feasible within a federated setup.
> For the Cluster Naming convention:
> - three situations for cluster name:
> > app submitted to router should take federated (aka logical) cluster name
> > app submitted directly to RM should take physical cluster name
> > Info about the physical cluster  in entities?
> - suggestion to set the cluster name as yarn tag at the router level (in the 
> app submission context) 
> Other points to note:
> - for federation to work smoothly in environments that use HDFS some 
> additional considerations are needed, and possibly some solution like what is 
> being used at Twitter with the nFly approach.
> Email thread context:
> {code}
> -- Forwarded message --
> From: Joep Rottinghuis 
> Date: Fri, Jul 8, 2016 at 1:22 PM
> Subject: Re: Federation -Timeline Service meeting notes
> To: Subramaniam Venkatraman Krishnan 
> Cc: Sangjin Lee, Vrushali Channapattan , Carlo Curino
> Thanks for the notes.
> I think that for federation to work smoothly in environments that use HDFS 
> some additional considerations are needed, and possibly some solution like 
> what we're using at Twitter with our nFly approach.
> bq. - need a different entity table/some table to store node level metrics 
> for physical cluster stats
> Once we get to node-level rollup, we probably have to store something in a 
> dc, cluster, rack, node hierarchy. In that case a physical cluster makes 
> sense, but we'd still need some way to tie physical and logical together in 
> order to make automatic error detection etc that we're envisioning feasible 
> within a federated setup.
> Cheers,
> Joep
> On Fri, Jul 8, 2016 at 1:00 PM, Subramaniam Venkatraman Krishnan  wrote:
> Thanks Vrushali for crisply capturing the essential from our rambling 
> discussion J.
>  
> Sangjin, I just want to add one comment to yours – we want to retain the 
> physical cluster name (possibly as a new entity type) so that we don’t lose 
> information & we can cluster level rollups even if they are not efficient.
>  
> Additionally, based on the walkthrough of Federation design:
> · There was general agreement with the proposed approach.
> · There is a possibility to implement the Application 
> TimelineCollector as an interceptor in the AMRMProxyService.
> · Joep raised the concern that it would be better if the RMs 
> obtain the epoch from FederationStateStore. This is not currently in the 
> roadmap of our MVP but we definitely plan to address this in future.
>  
> Regards,
> Subru
>  
> From: Sangjin Lee
> Sent: Thursday, July 07, 2016 6:22 PM
> To: Vrushali Channapattan 
> Cc: Joep Rottinghuis; Carlo Curino; Subramaniam Venkatraman Krishnan 
> Subject: Re: Federation -Timeline Service meeting notes
>  
> Thanks for the summary Vrushali!
>  
> Just so that we're on the same page regarding the terminology, I 
> understand we're using the terms "logical cluster" and "federated cluster" 
> interchangeably.
>  
> Also, between using the federated cluster name and the home cluster name 
> as a solution, I think we were leaning towards the federated cluster name 
> (although not concluded).
>  
> On Thu, Jul 

[jira] [Updated] (YARN-5357) Timeline service v2 integration with Federation

2016-07-11 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-5357:
-
Description: 
Jira to note the discussion points from an initial chat about integrating 
Timeline Service v2 with Federation (YARN-2915).

cc [~subru] [~curino] 

For Federation:
- all entities that belong to the same flow run should have the same cluster 
name
- app id in the same flow run strongly ordered in time
- need a logical cluster name and physical cluster name
- a possibility to implement the Application TimelineCollector as an 
interceptor in the AMRMProxyService.

For Timeline Service:
- need to store physical cluster id and logical cluster id so that we don't 
lose information at any level (flow/app/entity etc)
- add a  new table app id to cluster mapping table
- need a different entity table/some table to store node level metrics for 
physical cluster stats. Once we get to node-level rollup, we probably have to 
store something in a dc, cluster, rack, node hierarchy. In that case a physical 
cluster makes sense, but we'd still need some way to tie physical and logical 
together in order to make automatic error detection etc that we're envisioning 
feasible within a federated setup.


For the Cluster Naming convention:
- three situations for cluster name:
> app submitted to router should take federated (aka logical) cluster name
> app submitted directly to RM should take physical cluster name
> Info about the physical cluster  in entities?
- suggestion to set the cluster name as yarn tag at the router level (in the 
app submission context) 

Other points to note:
- for federation to work smoothly in environments that use HDFS some additional 
considerations are needed, and possibly some solution like what is being used 
at Twitter with the nFly approach.


Email thread context:

{code}

-- Forwarded message --
From: Joep Rottinghuis 
Date: Fri, Jul 8, 2016 at 1:22 PM
Subject: Re: Federation -Timeline Service meeting notes
To: Subramaniam Venkatraman Krishnan 
Cc: Sangjin Lee, Vrushali Channapattan , Carlo Curino


Thanks for the notes.

I think that for federation to work smoothly in environments that use HDFS some 
additional considerations are needed, and possibly some solution like what 
we're using at Twitter with our nFly approach.

bq. - need a different entity table/some table to store node level metrics for 
physical cluster stats
Once we get to node-level rollup, we probably have to store something in a dc, 
cluster, rack, node hierarchy. In that case a physical cluster makes sense, but 
we'd still need some way to tie physical and logical together in order to make 
automatic error detection etc that we're envisioning feasible within a 
federated setup.

Cheers,

Joep

On Fri, Jul 8, 2016 at 1:00 PM, Subramaniam Venkatraman Krishnan  wrote:

Thanks Vrushali for crisply capturing the essential from our rambling 
discussion J.

 

Sangjin, I just want to add one comment to yours – we want to retain the 
physical cluster name (possibly as a new entity type) so that we don’t lose 
information & we can cluster level rollups even if they are not efficient.

 

Additionally, based on the walkthrough of Federation design:

· There was general agreement with the proposed approach.

· There is a possibility to implement the Application 
TimelineCollector as an interceptor in the AMRMProxyService.

· Joep raised the concern that it would be better if the RMs obtain 
the epoch from FederationStateStore. This is not currently in the roadmap of 
our MVP but we definitely plan to address this in future.

 

Regards,

Subru

 

From: Sangjin Lee
Sent: Thursday, July 07, 2016 6:22 PM
To: Vrushali Channapattan 
Cc: Joep Rottinghuis; Carlo Curino; Subramaniam Venkatraman Krishnan 
Subject: Re: Federation -Timeline Service meeting notes

 

Thanks for the summary Vrushali!

 

Just so that we're on the same page regarding the terminology, I understand 
we're using the terms "logical cluster" and "federated cluster" interchangeably.

 

Also, between using the federated cluster name and the home cluster name as 
a solution, I think we were leaning towards the federated cluster name 
(although not concluded).

 

On Thu, Jul 7, 2016 at 4:33 PM, Vrushali Channapattan wrote:

 

For Federation:

- all entities that belong to the same flow run should have the same 
cluster name

- app id in the same flow run strongly ordered in time

- need a logical cluster name and physical cluster name

For Timeline Service:

- need to store physical cluster id and logical cluster id so that we 
don't lose information at any level (flow/app/entity etc)

- add a  new table app id to cluster mapping table

- need a