Viplav Madasu commented on YARN-796:
Just wanted to update on the HP's use case of node labels on a recently
announced HP Big Data Reference Architecture (HP BDRA) and demoed at HP
Discover event last week.
HP BDRA represents a rethinking of Hadoop infrastructure with the separation
of storage, networking and compute.
The architecture delivers extreme flexibility, with an ability to scale each
layer independently. Using some of the workload/compute/storage optimized
servers that HP offers and utilizing the YARN node labels feature on the
compute layer, it is possible to double the density of today’s traditional
Hadoop cluster with substantially better price performance, while, at the same
time, creating a single converged system that can allow Hadoop
(batch,interactive and NoSQL), Vertica, Spark and other big data technologies
to share a common pool of data.
YARN Labels allows us to create pools of compute nodes where applications run,
so it is possible to dynamically provision clusters without repartitioning data
and partition the cluster vertically to create isolated environments for batch,
interactive and low latency workloads.
Also we find that most workloads respond linearly to additional compute far
beyond the “one spindle per core”
rule that was prevalent before and we could scale compute by simply adding more
compute nodes or by reallocating the nodes from less priority job labels to
higher priority job labels.
Most interesting is that with labels, we can choose to deploy the Yarn
containers onto compute nodes that are optimized and accelerated for each
workload. In our initial configuration, we use the HP Moonshot System with HP
ProLiant m710 Server Cartridge for Hadoop because it is extremely dense and
cost effective, but also because it has an RDMA capable NIC that we use to
accelerate shuffles and an Intel Iris GPU that might offload compression and
other work into.
At the last week's HP Discover event in Las Vegas, we could demonstrate the
flexibility of HP BDRA by partitioning the compute layer for MapReduce, Hive on
Tez and HBase(using slider) clusters accessing the same HDFS data. We could see
that HBase throughput was unaffected by Hive/MR jobs and we could provision
dynamically more nodes to interactive Hive queries by reallocating nodes from
MR label to Hive labelled nodes and response time drops instantaneously by
adding more nodes to Hive label.
Thanks to the Hadoop community for collaborating on this very useful addition
to YARN functionality and special thanks to [~wangda], [~vinodkv] for all the
initiative and support provided.
> Allow for (admin) labels on nodes and resource-requests
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 2.4.1
> Reporter: Arun C Murthy
> Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf,
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf,
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1,
> YARN-796.patch, YARN-796.patch4
> It will be useful for admins to specify labels for nodes. Examples of labels
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on
> Obviously we need to support admin operations on adding/removing node labels.
This message was sent by Atlassian JIRA