qiubingxue created YARN-5814:
--------------------------------
Summary: Add druid as storage backend in YARN Timeline Service
Key: YARN-5814
URL: https://issues.apache.org/jira/browse/YARN-5814
Project: Hadoop YARN
Issue Type: New Feature
Components: ATSv2
Affects Versions: 3.0.0-alpha2
Reporter: qiubingxue
h3. Introduction
I propose to add druid as storage backend in YARN Timeline Service.
We run more than 6000 applications and generate 450 million metrics daily in
Alibaba Clusters with thousands of nodes. We need to collect and store
meta/events/metrics data, online analyze the utilization reports of various
dimensions and display the trends of allocation/usage resources for cluster by
joining and aggregating data. It helps us to manage and optimize the cluster by
tracking resource utilization.
To achieve our goal we have changed to use druid as the storage instead of
HBase and have achieved sub-second OLAP performance in our production
environment for few months.
h3. Analysis
Currently YARN Timeline Service only supports aggregating metrics at a) flow
level by FlowRunCoprocessor and b) application level metrics aggregating by
AppLevelTimelineCollector, offline (time-based periodic) aggregation for
flows/users/queues for reporting and analysis is planned but not yet
implemented. YARN Timeline Service chooses Apache HBase as the primary storage
backend. As we all know that HBase doesn't fit for OLAP.
For arbitrary exploration of data,such as online analyze the utilization
reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by
joining and aggregating data, Druid's custom column format enables ad-hoc
queries without pre-computation. The format also enables fast scans on columns,
which is important for good aggregation performance.
To achieve our goal that support to online analyze the utilization reports of
various dimensions, display the variation trends of allocation/usage resources
for cluster, and arbitrary exploration of data, we propose to add druid storage
and implement DruidWriter /DruidReader in YARN Timeline Service.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]