[ https://issues.apache.org/jira/browse/YARN-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283717#comment-17283717 ]
Adam Antal commented on YARN-10621: ----------------------------------- Thanks for bringing this issue to the community! >From the JIRA details I assume that you're using version 3.1.1, which has not >yet included the {{{DevicePlugin}} interface. This interface has been added to >the code for the very same purpose: discovering custom resources provided by >these plugin - just like the Nvidia GPUs. For more information look at the >umbrella jira: YARN-8851. I don't know if the changes you've been working on are based on this work, but it's the recommended way from 3.3.0 on. How is that aligns with you effort? > GPU management using OpenCL instead of vendor-specific solutions > ---------------------------------------------------------------- > > Key: YARN-10621 > URL: https://issues.apache.org/jira/browse/YARN-10621 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn > Reporter: Sotiris Niarchos > Priority: Minor > > As part of the [E2Data research project|https://e2data.eu/], we at the > [Institute of Communication and Computer Systems > (ICCS)|https://www.iccs.gr/en/?noredirect=en_US] of the National Technical > University of Athens, Greece, have been working on a modified version of > Hadoop Yarn where the GPU devices that are available in the underlying > cluster are discovered via a Java wrapper of the OpenCL framework API (namely > [JOCL|https://github.com/gpu/JOCL]), instead of vendor-specific binaries. > In other words, we have shifted towards *a more uniform and high-level > handling of GPUs as "OpenCL-enabled" devices*. This way, we manage to > *decouple GPU discovery/management from vendor-specific technicalities*; > every GPU, no matter the vendor, is the same for E2Data YARN (more > specifically, for the {{NodeManager}} component), provided that the OpenCL > runtime and drivers for the GPU(s) of interest are installed on the > respective node(s) of the cluster. > This way, we *managed to use GPUs other than NVIDIA* (which are the only ones > officially supported via the {{nvidia-smi}} binary) with minimal additional > effort, after our initial changes. > Ultimately, our goal is to *unify every processing unit* that YARN can > possible utilize (CPU cores, GPUs, FPGAs) *behind a common, simple, > high-level interface; that of the OpenCL-enabled device*. > The only drawback of our approach is that vendor-specific info regarding the > GPUs is lost (e.g. temperature). We believe, however, that the lost > information is not necessary for YARN; everything that Hadoop needs in order > to discover and handle GPU devices is provided by OpenCL. > This is just a proposition/a prompt for discussion for the time being. This > modified version is a work in progress. We consider community feedback > regarding the core concept (and the fact that it may constitute a paradigm > shift for YARN) crucial before attaching any patch file and diving into more > (technical) details. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org