Sotiris Niarchos created YARN-10621:
---------------------------------------
Summary: GPU management using OpenCL instead of vendor-specific
solutions
Key: YARN-10621
URL: https://issues.apache.org/jira/browse/YARN-10621
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager, yarn
Reporter: Sotiris Niarchos
As part of the [E2Data research project|https://e2data.eu/], we at the
[Institute of Communication and Computer Systems
(ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical
University of Athens, Greece, Athens have been working on a modified version of
Hadoop Yarn where the GPU devices that are available in the underlying cluster
are discovered via a Java wrapper of the OpenCL framework API (namely
[JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.
In other words, we have shifted towards *a more uniform and high-level handling
of GPUs as "OpenCL-enabled" devices*. This way, we manage to *decouple GPU
discovery/management from vendor-specific technicalities*; every GPU, no matter
the vendor, is the same for E2Data YARN (more specifically, by the
{{NodeManager}} component), provided that the OpenCL runtime and drivers for
the GPU(s) of interest are installed on the respective node(s) of the cluster.
This way, we *managed to use GPUs other than NVIDIA* (which are the only ones
officially supported via the {{nvidia-smi}} binary) with minimal additional
effort, after our initial changes.
Ultimately, our goal is to *unify every processing unit* that YARN can possible
utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, high-level
interface; that of the OpenCL-enabled device*.
The only drawback of our approach is that vendor-specific info regarding the
GPUs is lost (e.g. temperature). We believe, however, that the lost information
is not necessary for YARN; everything that Hadoop needs in order to discover
and handle GPU devices is provided by OpenCL.
This is just a proposition/a prompt for discussion for the time being. This
modified version is a work in progress. We consider community feedback
regarding the core concept (and the fact that it may constitute a paradigm
shift for YARN) crucial before attaching any patch file and diving into more
(technical) details.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]