Re: Does Apache Spark 3 support GPU usage for Spark RDDs?

2021-09-21 Thread Artemis User
Unfortunately the answer you got from the forum is true.  The current 
Spark-rapids package doesn't support RDD.  Please see 
https://nvidia.github.io/spark-rapids/docs/FAQ.html#what-parts-of-apache-spark-are-accelerated


I guess to be able to use spark-rapids, one option you have would be to 
convert Hail to use the DataFrame API instead of RDD.  Hope this helps...


-- ND

On 9/21/21 1:38 PM, Abhishek Shakya wrote:


Hi,

I am currently trying to run genomic analyses pipelines using 
Hail(library for genomics analyses written in python and Scala). 
Recently, Apache Spark 3 was released and it supported GPU usage.


I tried spark-rapids library to start an on-premise slurm cluster with 
gpu nodes. I was able to initialise the cluster. However, when I tried 
running hail tasks, the executors kept getting killed.


On querying in Hail forum, I got the response that

That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any 
Spark-SQL interfaces, only the RDD interfaces.

So, does Spark3 not support GPU usage for RDD interfaces?


PS: The question is posted in stackoverflow as well: Link 




Regards,
-

Abhishek Shakya
Senior Data Scientist 1,
Contact: +919002319890 | Email ID: abhishek.sha...@aganitha.ai 


Aganitha Cognitive Solutions 




Re: Does Apache Spark 3 support GPU usage for Spark RDDs?

2021-09-21 Thread Sean Owen
spark-rapids is not part of Spark, so couldn't speak to it, but Spark
itself does not use GPUs at all.
It does let you configure a task to request a certain number of GPUs, and
that would work for RDDs, but it's up to the code being executed to use the
GPUs.

On Tue, Sep 21, 2021 at 1:23 PM Abhishek Shakya 
wrote:

>
> Hi,
>
> I am currently trying to run genomic analyses pipelines using Hail(library
> for genomics analyses written in python and Scala). Recently, Apache Spark
> 3 was released and it supported GPU usage.
>
> I tried spark-rapids library to start an on-premise slurm cluster with gpu
> nodes. I was able to initialise the cluster. However, when I tried running
> hail tasks, the executors kept getting killed.
>
> On querying in Hail forum, I got the response that
>
> That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any
> Spark-SQL interfaces, only the RDD interfaces.
> So, does Spark3 not support GPU usage for RDD interfaces?
>
>
> PS: The question is posted in stackoverflow as well: Link
> 
>
>
> Regards,
> -
>
> Abhishek Shakya
> Senior Data Scientist 1,
> Contact: +919002319890 | Email ID: abhishek.sha...@aganitha.ai
> Aganitha Cognitive Solutions 
>


Does Apache Spark 3 support GPU usage for Spark RDDs?

2021-09-21 Thread Abhishek Shakya
Hi,

I am currently trying to run genomic analyses pipelines using Hail(library
for genomics analyses written in python and Scala). Recently, Apache Spark
3 was released and it supported GPU usage.

I tried spark-rapids library to start an on-premise slurm cluster with gpu
nodes. I was able to initialise the cluster. However, when I tried running
hail tasks, the executors kept getting killed.

On querying in Hail forum, I got the response that

That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any
Spark-SQL interfaces, only the RDD interfaces.
So, does Spark3 not support GPU usage for RDD interfaces?


PS: The question is posted in stackoverflow as well: Link



Regards,
-

Abhishek Shakya
Senior Data Scientist 1,
Contact: +919002319890 | Email ID: abhishek.sha...@aganitha.ai
Aganitha Cognitive Solutions