[
https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633315#comment-15633315
]
Junping Du commented on YARN-5814:
----------------------------------
Thanks [~BINGXUE QIU] for reporting this issue.
I think this use case and implementation from Alibaba could benefit our
community for several reasons:
1. It will show case our ATS v2 design and implementation are flexible to
different storage backend due to different use cases. It can be NoSQL (HBase),
filesystem (HDFS, guys from NTT seems to work on it) and of course, some OLAP
implementations.
2. Our current backend implementation on HBase is lacking of ad-hoc query on
timeline info. In our previous assumption for accessing these timeline info in
limited ways - like getting runtime or offline aggregation info from UI, it
won't be a problem. However, if we would like to support the case of
interactive queries for timeline info on a large and busy cluster, HBase may
not be the best fit. I believe there could be other YARN users than Alibaba to
have similar requirements if we are thinking analysis of yarn application info
is really a big data problem, and the proposed effort can expand our ATS v2
scenario.
I think we should consider to merge this proposal to our ATS v2 ongoing effort
(may be under YARN-5355?) if [~BINGXUE QIU] can work out a more concrete design.
ATS v2 folks ([~sjlee0], [~vinodkv], [~gtCarrera9], [~vrushalic],
[~jrottinghuis], [~varun_saxena], [~Naganarasimha] and [~rohithsharma]), what
do you guys think?
> Add druid as storage backend in YARN Timeline Service
> ------------------------------------------------------
>
> Key: YARN-5814
> URL: https://issues.apache.org/jira/browse/YARN-5814
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: ATSv2
> Affects Versions: 3.0.0-alpha2
> Reporter: Bingxue Qiu
>
> h3. Introduction
> I propose to add druid as storage backend in YARN Timeline Service.
> We run more than 6000 applications and generate 450 million metrics daily in
> Alibaba Clusters with thousands of nodes. We need to collect and store
> meta/events/metrics data, online analyze the utilization reports of various
> dimensions and display the trends of allocation/usage resources for cluster
> by joining and aggregating data. It helps us to manage and optimize the
> cluster by tracking resource utilization.
> To achieve our goal we have changed to use druid as the storage instead of
> HBase and have achieved sub-second OLAP performance in our production
> environment for few months.
> h3. Analysis
> Currently YARN Timeline Service only supports aggregating metrics at a) flow
> level by FlowRunCoprocessor and b) application level metrics aggregating by
> AppLevelTimelineCollector, offline (time-based periodic) aggregation for
> flows/users/queues for reporting and analysis is planned but not yet
> implemented. YARN Timeline Service chooses Apache HBase as the primary
> storage backend. As we all know that HBase doesn't fit for OLAP.
> For arbitrary exploration of data,such as online analyze the utilization
> reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by
> joining and aggregating data, Druid's custom column format enables ad-hoc
> queries without pre-computation. The format also enables fast scans on
> columns, which is important for good aggregation performance.
> To achieve our goal that support to online analyze the utilization reports of
> various dimensions, display the variation trends of allocation/usage
> resources for cluster, and arbitrary exploration of data, we propose to add
> druid storage and implement DruidWriter /DruidReader in YARN Timeline Service.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]