[jira] [Commented] (YARN-10621) GPU management using OpenCL instead of vendor-specific solutions

Adam Antal (Jira) Fri, 12 Feb 2021 06:14:06 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283717#comment-17283717
 ]


Adam Antal commented on YARN-10621:
-----------------------------------

Thanks for bringing this issue to the community!

>From the JIRA details I assume that you're using version 3.1.1, which has not 
>yet included the {{{DevicePlugin}} interface. This interface has been added to 
>the code for the very same purpose: discovering custom resources provided by 
>these plugin - just like the Nvidia GPUs. For more information look at the 
>umbrella jira: YARN-8851.

I don't know if the changes you've been working on are based on this work, but 
it's the recommended way from 3.3.0 on. How is that aligns with you effort?


> GPU management using OpenCL instead of vendor-specific solutions
> ----------------------------------------------------------------
>
>                 Key: YARN-10621
>                 URL: https://issues.apache.org/jira/browse/YARN-10621
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, yarn
>            Reporter: Sotiris Niarchos
>            Priority: Minor
>
> As part of the [E2Data research project|https://e2data.eu/], we at the 
> [Institute of Communication and Computer Systems 
> (ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical 
> University of Athens, Greece, have been working on a modified version of 
> Hadoop Yarn where the GPU devices that are available in the underlying 
> cluster are discovered via a Java wrapper of the OpenCL framework API (namely 
> [JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries.
> In other words, we have shifted towards *a more uniform and high-level 
> handling of GPUs as "OpenCL-enabled" devices*. This way, we manage to 
> *decouple GPU discovery/management from vendor-specific technicalities*; 
> every GPU, no matter the vendor, is the same for E2Data YARN (more 
> specifically, for the {{NodeManager}} component), provided that the OpenCL 
> runtime and drivers for the GPU(s) of interest are installed on the 
> respective node(s) of the cluster.
> This way, we *managed to use GPUs other than NVIDIA* (which are the only ones 
> officially supported via the {{nvidia-smi}} binary) with minimal additional 
> effort, after our initial changes.
> Ultimately, our goal is to *unify every processing unit* that YARN can 
> possible utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, 
> high-level interface; that of the OpenCL-enabled device*.
> The only drawback of our approach is that vendor-specific info regarding the 
> GPUs is lost (e.g. temperature). We believe, however, that the lost 
> information is not necessary for YARN; everything that Hadoop needs in order 
> to discover and handle GPU devices is provided by OpenCL.
> This is just a proposition/a prompt for discussion for the time being. This 
> modified version is a work in progress. We consider community feedback 
> regarding the core concept (and the fact that it may constitute a paradigm 
> shift for YARN) crucial before attaching any patch file and diving into more 
> (technical) details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10621) GPU management using OpenCL instead of vendor-specific solutions

Reply via email to