There's some discussion and proposal of supporting GPUs in this Spark JIRA:
https://jira.apache.org/jira/browse/SPARK-24615 "Accelerator-aware task
scheduling for Spark"

Susan

On Thu, Jul 12, 2018 at 11:17 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> I agree.
>
> Adding GPU capability to Spark in my opinion is a must for Advanced
> Analytics.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 12 Jul 2018 at 19:14, Maximiliano Felice <
> maximilianofel...@gmail.com> wrote:
>
>> Hi,
>>
>> I've been meaning to reply to this email for a while now, sorry for
>> taking so much time.
>>
>> I personally think that adding GPU resource management will allow us to
>> boost some ETL performance a lot. For the last year, I've worked in
>> transforming some Machine Learning pipelines from Python in Numpy/Pandas to
>> Spark. Adding GPU capabilities to Spark would:
>>
>>
>>    - Accelerate many matrix and batch computations we currently have in
>>    Tensorflow
>>    - Allow us to use spark for the whole pipeline (combined with
>>    possibly better online serving)
>>    - Let us trigger better Hyperparameter selection directly from Spark
>>
>>
>> There will be many more aspects of this that we could explode. What do
>> the rest of the list think?
>>
>> See you
>>
>> El mié., 16 may. 2018 a las 2:58, Daniel Galvez (<dt.gal...@gmail.com>)
>> escribió:
>>
>>> Hi all,
>>>
>>> Is anyone here interested in adding the ability to request GPUs to
>>> Spark's client (i.e, spark-submit)? As of now, Yarn 3.0's resource manager
>>> server has the ability to schedule GPUs as resources via cgroups, but the
>>> Spark client lacks an ability to request these.
>>>
>>> The ability to guarantee GPU resources would be practically useful for
>>> my organization. Right now, the only way to do that is to request the
>>> entire memory (or all CPU's) on a node, which is very kludgey and wastes
>>> resources, especially if your node has more than 1 GPU and your code was
>>> written such that an executor can use only one GPU at a time.
>>>
>>> I'm just not sure of a good way to make use of libraries like
>>> Databricks' Deep Learning pipelines
>>> <https://github.com/databricks/spark-deep-learning> for GPU-heavy
>>> computation otherwise, unless you are luckily in an organization which is
>>> able to virtualize computer nodes such that each node will have only one
>>> GPU. Of course, I realize that many Databricks customers are using Azure or
>>> AWS, which allow you to do this facilely. Is this what people normally do
>>> in industry?
>>>
>>> This is something I am interested in working on, unless others out there
>>> have advice on why this is a bad idea.
>>>
>>> Unfortunately, I am not familiar enough with Mesos and Kubernetes right
>>> now to know how they schedule gpu resources and whether adding support for
>>> requesting GPU's from them to the spark-submit client would be simple.
>>>
>>> Daniel
>>>
>>> --
>>> Daniel Galvez
>>> http://danielgalvez.me
>>> https://github.com/galv
>>>
>>


-- 
Susan X. Huynh
Software engineer, Data Agility
xhu...@mesosphere.com

Reply via email to