[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579460#comment-14579460 ]
Viplav Madasu commented on YARN-796: ------------------------------------ Hi, Just wanted to update on the HP's use case of node labels on a recently announced HP Big Data Reference Architecture (HP BDRA) and demoed at HP Discover event last week. HP BDRA represents a rethinking of Hadoop infrastructure with the separation of storage, networking and compute. The architecture delivers extreme flexibility, with an ability to scale each layer independently. Using some of the workload/compute/storage optimized servers that HP offers and utilizing the YARN node labels feature on the compute layer, it is possible to double the density of today’s traditional Hadoop cluster with substantially better price performance, while, at the same time, creating a single converged system that can allow Hadoop (batch,interactive and NoSQL), Vertica, Spark and other big data technologies to share a common pool of data. YARN Labels allows us to create pools of compute nodes where applications run, so it is possible to dynamically provision clusters without repartitioning data and partition the cluster vertically to create isolated environments for batch, interactive and low latency workloads. Also we find that most workloads respond linearly to additional compute far beyond the “one spindle per core” rule that was prevalent before and we could scale compute by simply adding more compute nodes or by reallocating the nodes from less priority job labels to higher priority job labels. Most interesting is that with labels, we can choose to deploy the Yarn containers onto compute nodes that are optimized and accelerated for each workload. In our initial configuration, we use the HP Moonshot System with HP ProLiant m710 Server Cartridge for Hadoop because it is extremely dense and cost effective, but also because it has an RDMA capable NIC that we use to accelerate shuffles and an Intel Iris GPU that might offload compression and other work into. At the last week's HP Discover event in Las Vegas, we could demonstrate the flexibility of HP BDRA by partitioning the compute layer for MapReduce, Hive on Tez and HBase(using slider) clusters accessing the same HDFS data. We could see that HBase throughput was unaffected by Hive/MR jobs and we could provision dynamically more nodes to interactive Hive queries by reallocating nodes from MR label to Hive labelled nodes and response time drops instantaneously by adding more nodes to Hive label. Thanks to the Hadoop community for collaborating on this very useful addition to YARN functionality and special thanks to [~wangda], [~vinodkv] for all the initiative and support provided. > Allow for (admin) labels on nodes and resource-requests > ------------------------------------------------------- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 2.4.1 > Reporter: Arun C Murthy > Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, > Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.14.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)