[
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated YARN-5683:
--------------------------------
Labels: oct16-hard (was: )
> Support specifying storage type for per-application local dirs
> --------------------------------------------------------------
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager
> Affects Versions: 3.0.0-alpha2
> Reporter: Tao Yang
> Assignee: Tao Yang
> Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch,
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3. Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc)
> using local storage (checkpoint, shuffle etc) might require high IO
> performance. It's useful to allocate local directories to high performance
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications
> cannot selectively use storage media with different performance
> characteristics. Adding awareness of storage media can allow YARN to make
> better decisions about the placement of local directories.
> h3. Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration
> should allow the cluster administrator to optionally specify the storage type
> for each local directories. Example:
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default
> storage type.
> ** StorageLocation separates storage type and directory path, used by
> LocalDirAllocator to aware the types of local dirs, the default storage type
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the
> local directory of the specified storage type, and will fallback to not care
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch.
> All application frameworks can set the environment variables
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage
> type of local/log directories, and choose to not launch container if fallback
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an
> example)
> ** Add new configurations should allow application administrator to
> optionally specify the storage type of local/log directories and fallback
> strategy (MapReduce configurations: mapreduce.job.local-storage-type,
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local
> directories of frameworks (such as shuffle directories for MapReduce).
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3. Further Discussion
> * The requirement of storage type for local/log directories may not be
> satisfied on heterogeneous clusters. To achieve global optimum, scheduler
> should aware and manage disk resources.
> [YARN-2139|https://issues.apache.org/jira/browse/YARN-2139] is close to that
> but seems not support multiple storage types, maybe we should do even more to
> aware the storage type of disk resource?
> * Node labels or node constraints
> ([YARN-3409|https://issues.apache.org/jira/browse/YARN-3409]) can also make a
> higher chance to satisfy the requirement of specified storage type.
> * Fallback strategy still needs to be concerned. Certain applications might
> not work well when the requirement of storage type is not satisfied. When
> none of desired storage type disk are available, should container launching
> be failed? let AM handle? We have implemented a fallback strategy that fail
> to launch container when none of desired storage type disk are available. Is
> there some better methods?
> This feature has been used for half a year to meet the needs of some
> applications on Alibaba search clusters.
> Please feel free to give your suggestions and opinions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]