There's some discussion and proposal of supporting GPUs in this Spark JIRA: https://jira.apache.org/jira/browse/SPARK-24615 "Accelerator-aware task scheduling for Spark"
Susan On Thu, Jul 12, 2018 at 11:17 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > I agree. > > Adding GPU capability to Spark in my opinion is a must for Advanced > Analytics. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 12 Jul 2018 at 19:14, Maximiliano Felice < > maximilianofel...@gmail.com> wrote: > >> Hi, >> >> I've been meaning to reply to this email for a while now, sorry for >> taking so much time. >> >> I personally think that adding GPU resource management will allow us to >> boost some ETL performance a lot. For the last year, I've worked in >> transforming some Machine Learning pipelines from Python in Numpy/Pandas to >> Spark. Adding GPU capabilities to Spark would: >> >> >> - Accelerate many matrix and batch computations we currently have in >> Tensorflow >> - Allow us to use spark for the whole pipeline (combined with >> possibly better online serving) >> - Let us trigger better Hyperparameter selection directly from Spark >> >> >> There will be many more aspects of this that we could explode. What do >> the rest of the list think? >> >> See you >> >> El mié., 16 may. 2018 a las 2:58, Daniel Galvez (<dt.gal...@gmail.com>) >> escribió: >> >>> Hi all, >>> >>> Is anyone here interested in adding the ability to request GPUs to >>> Spark's client (i.e, spark-submit)? As of now, Yarn 3.0's resource manager >>> server has the ability to schedule GPUs as resources via cgroups, but the >>> Spark client lacks an ability to request these. >>> >>> The ability to guarantee GPU resources would be practically useful for >>> my organization. Right now, the only way to do that is to request the >>> entire memory (or all CPU's) on a node, which is very kludgey and wastes >>> resources, especially if your node has more than 1 GPU and your code was >>> written such that an executor can use only one GPU at a time. >>> >>> I'm just not sure of a good way to make use of libraries like >>> Databricks' Deep Learning pipelines >>> <https://github.com/databricks/spark-deep-learning> for GPU-heavy >>> computation otherwise, unless you are luckily in an organization which is >>> able to virtualize computer nodes such that each node will have only one >>> GPU. Of course, I realize that many Databricks customers are using Azure or >>> AWS, which allow you to do this facilely. Is this what people normally do >>> in industry? >>> >>> This is something I am interested in working on, unless others out there >>> have advice on why this is a bad idea. >>> >>> Unfortunately, I am not familiar enough with Mesos and Kubernetes right >>> now to know how they schedule gpu resources and whether adding support for >>> requesting GPU's from them to the spark-submit client would be simple. >>> >>> Daniel >>> >>> -- >>> Daniel Galvez >>> http://danielgalvez.me >>> https://github.com/galv >>> >> -- Susan X. Huynh Software engineer, Data Agility xhu...@mesosphere.com