[jira] [Commented] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024312#comment-16024312 ] Bingxue Qiu commented on YARN-6645: --- hi, [~cheersyang], we backport the YARN-1503 to hadoop 2.8 in our clusters. for this exception, we create the nmPrivateDir in writeScriptToNMPrivateDir method like this, please feel free to give me some suggestion, Thank you! private File writeScriptToNMPrivateDir(String nmPrivateDir, String command) throws IOException { File file = new File(nmPrivateDir); if (!file.mkdirs()) { if (!file.exists()) { LOG.error("Failed to create nmPrivate dir " + file); } } File tmp = File.createTempFile("cmd_", "_tmp", new File(nmPrivateDir)); Writer writer = new OutputStreamWriter(new FileOutputStream(tmp), "UTF-8"); PrintWriter printWriter = new PrintWriter(writer); printWriter.print(command); printWriter.close(); return tmp; } > Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor > --- > > Key: YARN-6645 > URL: https://issues.apache.org/jira/browse/YARN-6645 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bingxue Qiu > Fix For: 2.9.0 > > Attachments: error when creating symlink.png > > > when creating symlink after the resource localized in our clusters , an > IOException has been thrown, because the nmPrivateDir doesn't exist. we add a > patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-6645: -- Attachment: error when creating symlink.png add the error logs when creating symlink > Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor > --- > > Key: YARN-6645 > URL: https://issues.apache.org/jira/browse/YARN-6645 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bingxue Qiu > Fix For: 2.9.0 > > Attachments: error when creating symlink.png > > > when creating symlink after the resource localized in our clusters , an > IOException has been thrown, because the nmPrivateDir doesn't exist. we add a > patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024302#comment-16024302 ] Bingxue Qiu commented on YARN-6645: --- hi [~cheersyang] , i will upload the logs and patch later, Thank you! > Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor > --- > > Key: YARN-6645 > URL: https://issues.apache.org/jira/browse/YARN-6645 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bingxue Qiu > Fix For: 2.9.0 > > > when creating symlink after the resource localized in our clusters , an > IOException has been thrown, because the nmPrivateDir doesn't exist. we add a > patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-6645: -- Fix Version/s: 2.9.0 > Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor > --- > > Key: YARN-6645 > URL: https://issues.apache.org/jira/browse/YARN-6645 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bingxue Qiu > Fix For: 2.9.0 > > > when creating symlink after the resource localized in our clusters , an > IOException has been thrown, because the nmPrivateDir doesn't exist. we add a > patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-6645: -- Description: when creating symlink after the resource localized in our clusters , an IOException has been thrown, because the nmPrivateDir doesn't exist. we add a patch to fix it. > Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor > --- > > Key: YARN-6645 > URL: https://issues.apache.org/jira/browse/YARN-6645 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bingxue Qiu > > when creating symlink after the resource localized in our clusters , an > IOException has been thrown, because the nmPrivateDir doesn't exist. we add a > patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6645) Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor
Bingxue Qiu created YARN-6645: - Summary: Bug fix in ContainerImpl when calling the symLink of LinuxContainerExecutor Key: YARN-6645 URL: https://issues.apache.org/jira/browse/YARN-6645 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Bingxue Qiu -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6624) The implementation of getLocalizationStatus
[ https://issues.apache.org/jira/browse/YARN-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-6624: -- Attachment: YARN-6624.1.patch add the YARN-6624.1.patch > The implementation of getLocalizationStatus > --- > > Key: YARN-6624 > URL: https://issues.apache.org/jira/browse/YARN-6624 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Bingxue Qiu > Attachments: YARN-6624.1.patch > > > We have a use case, where the client need to know the state of localization > resources, With the design of [Continuous-resource-localization | > https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf] > , we choose to include it as part of > ContainerStatus. > Proposal: > When using the getContainerStatus, we can check the state by > pendingResources,resourcesFailedToBeLocalized in ResourceSet. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto
[ https://issues.apache.org/jira/browse/YARN-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-6606: -- Attachment: YARN-6606.2.patch add the YARN-6606.2.patch > The implementation of LocalizationStatus in ContainerStatusProto > > > Key: YARN-6606 > URL: https://issues.apache.org/jira/browse/YARN-6606 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Bingxue Qiu > Fix For: 2.9.0 > > Attachments: YARN-6606.1.patch, YARN-6606.2.patch > > > we have a use case, where the full implementation of localization status in > ContainerStatusProto > [Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf] >need to be done , so we make it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6624) The implementation of getLocalizationStatus
Bingxue Qiu created YARN-6624: - Summary: The implementation of getLocalizationStatus Key: YARN-6624 URL: https://issues.apache.org/jira/browse/YARN-6624 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.9.0 Reporter: Bingxue Qiu Fix For: 2.9.0 We have a use case, where the client need to know the state of localization resources, With the design of [Continuous-resource-localization | https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf] , we choose to include it as part of ContainerStatus. Proposal: When using the getContainerStatus, we can check the state by pendingResources,resourcesFailedToBeLocalized in ResourceSet. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1503) Support making additional 'LocalResources' available to running containers
[ https://issues.apache.org/jira/browse/YARN-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012024#comment-16012024 ] Bingxue Qiu commented on YARN-1503: --- hi,[~jianhe] we have a use case, where the full implementation of localization status in ContainerStatusProto need to be done , so we make it. please feel free to give some advice , thx. [YARN-6606 |https://issues.apache.org/jira/browse/YARN-6606] > Support making additional 'LocalResources' available to running containers > -- > > Key: YARN-1503 > URL: https://issues.apache.org/jira/browse/YARN-1503 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jian He > Attachments: Continuous-resource-localization.pdf > > > We have a use case, where additional resources (jars, libraries etc) need to > be made available to an already running container. Ideally, we'd like this to > be done via YARN (instead of having potentially multiple containers per node > download resources on their own). > Proposal: > NM to support an additional API where a list of resources can be specified. > Something like "localiceResource(ContainerId, Map) > NM would also require an additional API to get state for these resources - > "getLocalizationState(ContainerId)" - which returns the current state of all > local resources for the specified container(s). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto
[ https://issues.apache.org/jira/browse/YARN-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-6606: -- Attachment: YARN-6606.1.patch add the YARN-6606.1.patch > The implementation of LocalizationStatus in ContainerStatusProto > > > Key: YARN-6606 > URL: https://issues.apache.org/jira/browse/YARN-6606 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Bingxue Qiu > Fix For: 2.9.0 > > Attachments: YARN-6606.1.patch > > > we have a use case, where the full implementation of localization status in > ContainerStatusProto > [Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf] >need to be done , so we make it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto
Bingxue Qiu created YARN-6606: - Summary: The implementation of LocalizationStatus in ContainerStatusProto Key: YARN-6606 URL: https://issues.apache.org/jira/browse/YARN-6606 Project: Hadoop YARN Issue Type: Task Components: nodemanager Affects Versions: 2.9.0 Reporter: Bingxue Qiu we have a use case, where the full implementation of localization status in ContainerStatusProto [Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf] need to be done , so we make it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3881) Writing RM cluster-level metrics
[ https://issues.apache.org/jira/browse/YARN-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676313#comment-15676313 ] Bingxue Qiu commented on YARN-3881: --- Hi [~zjshen], I haven't find the totalVirtualCores / totalMB of cluster metrics in the metrics.json, maybe it's necessary to show the water line trends when the nodes changes, just like add nodes or nodes fail? > Writing RM cluster-level metrics > > > Key: YARN-3881 > URL: https://issues.apache.org/jira/browse/YARN-3881 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: YARN-5355 > Attachments: metrics.json > > > RM has a bunch of metrics that we may want to write into the timeline backend > to. I attached the metrics.json that I've crawled via > {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to > three groups of metrics: > 1. QueueMetrics > 2. JvmMetrics > 3. ClusterMetrics > The problem is that unlike other metrics belongs to a single application, > these ones belongs to RM or cluster-wide. Therefore, current write path is > not going to work for these metrics because they don't have the associated > user/flow/app context info. We need to rethink of modeling cross-app metrics > and the api to handle them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15676080#comment-15676080 ] Bingxue Qiu commented on YARN-5814: --- Thanks [~sjlee0] for your Suggestions! On the Druid reader side, queries are based on the Drill. So the conditions like filter list can supported by self-join,left-join. such as: {code} select F.* FROM druid.timeline_service_app F, druid.timeline_service_app S WHERE F.appId = S.appId AND F.startTime > 1479440083000 AND S.finishTime > 0 AND F.appId = 'application_1476875405903_49989'; {code} I also feel deeply grateful that you reminding me the new issues, druid support order by column, maybe add a column named "idPrefix" make sense? > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670054#comment-15670054 ] Bingxue Qiu commented on YARN-5814: --- Thanks [~gtCarrera9] for your suggestions: 1. For the writer design issues, We have implemented writer by kafka and a mr job (for HA) pull data to the realtime nodes of druid. But I'm not so sure this method is also fit for others. After all tranquility is more simple. I will give the design of them later. we can choose to implement one or both of them. 2. For the table design, it may not be fit for using timeline.entity table to hold general timeline entities including container data in druid implementation. In HBase implementation, we can store general timeline entities with column family in entity table and scan them by rowkey. But druid is fixed schema column storage, if we need ad-hoc/agg in real-time, timeline.entity table maybe a wide table with many columns. It would bring the data redundancy and generate many rows and increase cache miss. That's why we consider to add these tables but not timeline.entity Please feel free to give your suggestions. Thanks! > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656598#comment-15656598 ] Bingxue Qiu commented on YARN-5814: --- I have uploaded the design. It contains our ideas about druid writer, reader and schema. Please feel free to give your suggestions. Thanks > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-5814: -- Attachment: Add-Druid-in-YARN-Timeline-Service.pdf > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-5814: -- Attachment: (was: Add-Druid-in-YARN-Timeline-Service.pdf) > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-5814: -- Attachment: Add-Druid-in-YARN-Timeline-Service.pdf > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635734#comment-15635734 ] Bingxue Qiu commented on YARN-5814: --- Thanks [~djp], [~sjlee0] for your support! I will give a more concrete design next week > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-5814: -- Comment: was deleted (was: Thanks [~djp], [~sjlee0] for your support! I will give a more concrete design next week ) > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635730#comment-15635730 ] Bingxue Qiu commented on YARN-5814: --- Thanks [~djp], [~sjlee0] for your support! I will give a more concrete design next week > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org