[ 
https://issues.apache.org/jira/browse/YARN-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827214#comment-15827214
 ] 

Joep Rottinghuis commented on YARN-6105:
----------------------------------------

+1 for separate jira, thanks for filing.
+1 for some functionality like this.
Not sure this should be marked as a merge-blocker. This will likely result in 
an additional table, but that isn't quite the same as a backwards incompatible 
schema change.

Wrt. implementation:
If indeed you're in a scenario with ephemeral clusters (let's say launch a 
cluster for a job and then shut it down), then the number of clusters will 
certainly grow very rapidly. Having an API that returns _ALL_ of them is 
probably not desirable, as it will get slower and slower until the total return 
size exceeds and then it will no longer work.

We should consider something like the flow activity table to be able to have a 
list of all the clusters that have been active over a certain period of time. 
For a static cluster setup, it would seem somewhat wasteful to store every 
cluster on a daily basis, so some compromise might have to be struck here.

Perhaps all clusters active in a month (sort of aligning with a typical billing 
cycle) or per week.

Even if we write this for each launched Yarn application, then for large 
persistent clusters that run many tens of thousands of jobs per day, that would 
still be quite some overhead. Maybe this would be a good candidate for one of 
the first use-cases for offline processing, where some job can walk the data 
once per day and create a "list of clusters with a count of number of 
apps/flows on them", possibly even with some sum of total MB-ms to assess the 
total "cost" for these clusters.

> Support for new REST end point /clusterids
> ------------------------------------------
>
>                 Key: YARN-6105
>                 URL: https://issues.apache.org/jira/browse/YARN-6105
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
>
> As discussed in YARN-5378 and YARN-6095, it is required to have */clusterids* 
> that returns list of clusterids that back end has is useful. 
> Use case : In cloud, clusters are arbitrarily spin up and destroyed. Each 
> cluster has its own clusterId which UI never knows about it. To all those 
> newly spin up cluster, same ATS server has been used. And sam web UI has been 
> used. Admin can select the clusterId and navigate to any pages. So, it is 
> worth to list ClusterId's from ATS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to