This thread may be better suited as a discussion in our Spark plug-in’s
repo:
https://github.com/NVIDIA/spark-rapids/discussions.

Just to answer the questions that were asked so far:

I would recommend checking our documentation for what is supported as of
our latest release (22.06):
https://nvidia.github.io/spark-rapids/docs/supported_ops.html, as we have
quite a bit of support for decimal and also nested types and keep adding
coverage.

For UDFs, if you are willing to rewrite it to use the RAPIDS cuDF API, we
do have support and examples on how to do this, please check out this:
https://nvidia.github.io/spark-rapids/docs/additional-functionality/rapids-udfs.html.
Automatically translating UDFs to GPUs is not easy. We have a Scala UDF to
catalyst transpiler that will be able to handle very simple UDFs where
every operation has a corresponding catalyst expression, that may be worth
checking out:
https://nvidia.github.io/spark-rapids/docs/additional-functionality/udf-to-catalyst-expressions.html.
This transpiler falls back if it can’t translate any part of the UDF.

The plug-in will not fail in case where it can’t run part of a query on the
GPU, it will fall back and run on the CPU for the parts of the query that
are not supported. It will also output what it can’t optimize on the driver
(on .explain), which should help narrow down an expression or exec that
should be looked at further.

There are other resources all linked from here:
https://nvidia.github.io/spark-rapids/ (of interest may be the
Qualification Tool, and our Getting Started guide for different cloud
providers and distros).

I’d say let’s continue this in the discussions or as issues in the
spark-rapids repo if you have further questions or run into issues, as it’s
not specific to Apache Spark.

Thanks!

Alessandro

On Sat, Aug 13, 2022 at 10:53 AM Sean Owen <sro...@gmail.com> wrote:

> This isn't a Spark question, but rather a question about whatever Spark
> application you are talking about. RAPIDS?
>
> On Sat, Aug 13, 2022 at 10:35 AM rajat kumar <kumar.rajat20...@gmail.com>
> wrote:
>
>> Thanks Sean.
>>
>> Also, I observed that lots of things are not supported in GPU by NVIDIA.
>> E.g. nested types/decimal type/Udfs etc.
>> So, will it use CPU automatically for running those tasks which require
>> nested types or will it run on GPU and fail.
>>
>> Thanks
>> Rajat
>>
>> On Sat, Aug 13, 2022, 18:54 Sean Owen <sro...@gmail.com> wrote:
>>
>>> Spark does not use GPUs itself, but tasks you run on Spark can.
>>> The only 'support' there is is for requesting GPUs as resources for
>>> tasks, so it's just a question of resource management. That's in OSS.
>>>
>>> On Sat, Aug 13, 2022 at 8:16 AM rajat kumar <kumar.rajat20...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have been hearing about GPU in spark3.
>>>>
>>>> For batch jobs , will it help to improve GPU performance. Also is GPU
>>>> support available only on Databricks or on cloud based Spark clusters ?
>>>>
>>>> I am new , if anyone can share insight , it will help
>>>>
>>>> Thanks
>>>> Rajat
>>>>
>>>

Reply via email to