Re: GPU job in Spark 3

2021-04-15 Thread Gourav Sengupta
Hi,

completely agree with Hao. In case you are using YARN try to see the EMR
documentation on how to enable GPU as resource in YARN before trying to use
that in SPARK.

This is one of the most exciting features of SPARK 3, and you can reap huge
benefits out of it :)


Regards,
Gourav Sengupta

On Fri, Apr 9, 2021 at 6:10 PM HaoZ  wrote:

> Hi Martin,
>
> I tested the local mode in Spark on Rapids Accelerator and it works fine
> for
> me.
> The only possible issue is the CUDA 11.2 however the supported CUDA version
> as per https://nvidia.github.io/spark-rapids/docs/download.html is 11.0.
>
> Here is a quick test using Spark local mode.
> Note: When I was testing this local mode, I make sure there is nothing in
> spark-defaults.conf so everything is clean.
>
> ==
> scala> val df = sc.makeRDD(1 to 100, 6).toDF
> df: org.apache.spark.sql.DataFrame = [value: int]
>
> scala> val df2 = sc.makeRDD(1 to 100, 6).toDF
> df2: org.apache.spark.sql.DataFrame = [value: int]
>
> scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a"
> === $"b").count
> res0: Long = 100
> scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a"
> === $"b").explain()
> == Physical Plan ==
> GpuColumnarToRow false
> +- GpuShuffledHashJoin [a#29], [b#31], Inner, GpuBuildRight, false
>:- GpuShuffleCoalesce 2147483647
>:  +- GpuColumnarExchange gpuhashpartitioning(a#29, 10),
> ENSURE_REQUIREMENTS, [id=#221]
>: +- GpuProject [value#2 AS a#29]
>:+- GpuRowToColumnar TargetSize(2147483647)
>:   +- *(1) SerializeFromObject [input[0, int, false] AS
> value#2]
>:  +- Scan[obj#1]
>+- GpuCoalesceBatches RequireSingleBatch
>   +- GpuShuffleCoalesce 2147483647
>  +- GpuColumnarExchange gpuhashpartitioning(b#31, 10),
> ENSURE_REQUIREMENTS, [id=#228]
> +- GpuProject [value#8 AS b#31]
>+- GpuRowToColumnar TargetSize(2147483647)
>   +- *(2) SerializeFromObject [input[0, int, false] AS
> value#8]
>  +- Scan[obj#7]
> ==
>
> Thanks,
> Hao
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: GPU job in Spark 3

2021-04-09 Thread Sean Owen
(I apologize, I totally missed that this should use GPUs because of RAPIDS.
Ignore my previous. But yeah it's more a RAPIDS question.)

On Fri, Apr 9, 2021 at 12:09 PM HaoZ  wrote:

> Hi Martin,
>
> I tested the local mode in Spark on Rapids Accelerator and it works fine
> for
> me.
> The only possible issue is the CUDA 11.2 however the supported CUDA version
> as per https://nvidia.github.io/spark-rapids/docs/download.html is 11.0.
>
> Here is a quick test using Spark local mode.
> Note: When I was testing this local mode, I make sure there is nothing in
> spark-defaults.conf so everything is clean.
>
> ==
> scala> val df = sc.makeRDD(1 to 100, 6).toDF
> df: org.apache.spark.sql.DataFrame = [value: int]
>
> scala> val df2 = sc.makeRDD(1 to 100, 6).toDF
> df2: org.apache.spark.sql.DataFrame = [value: int]
>
> scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a"
> === $"b").count
> res0: Long = 100
> scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a"
> === $"b").explain()
> == Physical Plan ==
> GpuColumnarToRow false
> +- GpuShuffledHashJoin [a#29], [b#31], Inner, GpuBuildRight, false
>:- GpuShuffleCoalesce 2147483647
>:  +- GpuColumnarExchange gpuhashpartitioning(a#29, 10),
> ENSURE_REQUIREMENTS, [id=#221]
>: +- GpuProject [value#2 AS a#29]
>:+- GpuRowToColumnar TargetSize(2147483647)
>:   +- *(1) SerializeFromObject [input[0, int, false] AS
> value#2]
>:  +- Scan[obj#1]
>+- GpuCoalesceBatches RequireSingleBatch
>   +- GpuShuffleCoalesce 2147483647
>  +- GpuColumnarExchange gpuhashpartitioning(b#31, 10),
> ENSURE_REQUIREMENTS, [id=#228]
> +- GpuProject [value#8 AS b#31]
>+- GpuRowToColumnar TargetSize(2147483647)
>   +- *(2) SerializeFromObject [input[0, int, false] AS
> value#8]
>  +- Scan[obj#7]
> ==
>
> Thanks,
> Hao
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: GPU job in Spark 3

2021-04-09 Thread HaoZ
Hi Martin,

I tested the local mode in Spark on Rapids Accelerator and it works fine for
me.
The only possible issue is the CUDA 11.2 however the supported CUDA version
as per https://nvidia.github.io/spark-rapids/docs/download.html is 11.0.

Here is a quick test using Spark local mode.
Note: When I was testing this local mode, I make sure there is nothing in
spark-defaults.conf so everything is clean.

==
scala> val df = sc.makeRDD(1 to 100, 6).toDF
df: org.apache.spark.sql.DataFrame = [value: int]

scala> val df2 = sc.makeRDD(1 to 100, 6).toDF
df2: org.apache.spark.sql.DataFrame = [value: int]

scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a"
=== $"b").count
res0: Long = 100
scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a"
=== $"b").explain()
== Physical Plan ==
GpuColumnarToRow false
+- GpuShuffledHashJoin [a#29], [b#31], Inner, GpuBuildRight, false
   :- GpuShuffleCoalesce 2147483647
   :  +- GpuColumnarExchange gpuhashpartitioning(a#29, 10),
ENSURE_REQUIREMENTS, [id=#221]
   : +- GpuProject [value#2 AS a#29]
   :+- GpuRowToColumnar TargetSize(2147483647)
   :   +- *(1) SerializeFromObject [input[0, int, false] AS value#2]
   :  +- Scan[obj#1]
   +- GpuCoalesceBatches RequireSingleBatch
  +- GpuShuffleCoalesce 2147483647
 +- GpuColumnarExchange gpuhashpartitioning(b#31, 10),
ENSURE_REQUIREMENTS, [id=#228]
+- GpuProject [value#8 AS b#31]
   +- GpuRowToColumnar TargetSize(2147483647)
  +- *(2) SerializeFromObject [input[0, int, false] AS
value#8]
 +- Scan[obj#7]
==

Thanks,
Hao



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: GPU job in Spark 3

2021-04-09 Thread Tom Graves
 Hey Martin,
I would encourage you to file issues in the spark-rapids repo for questions 
with that plugin: https://github.com/NVIDIA/spark-rapids/issues
I'm assuming the query ran and you looked at the sql UI or the .expalin() 
output and it was on cpu and not gpu?  I am assuming you have the cuda 11.0 
runtime installed (look in /usr/local). You printed the driver version which is 
11.2 but the runtimes can be different. You are using the 11.0 cuda version of 
the cudf library. If that didn't match runtime though it would have failed and 
not ran anything.
The easiest way to tell why it didn't run on the GPU is to enable the config: 
spark.rapids.sql.explain=NOT_ON_GPU 
It will print out logs to your console as to why different operators don't run 
on the gpu.  
Again feel free to open up a question issues in the spark-rapids repo and we 
can discuss more there.
Tom
On Friday, April 9, 2021, 11:19:05 AM CDT, Martin Somers 
 wrote:  
 
 
Hi Everyone !!

Im trying to get on premise GPU instance of Spark 3 running on my ubuntu box, 
and I am following:  
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation

Anyone with any insight into why a spark job isnt being ran on the GPU - 
appears to be all on the CPU, hadoop binary installed and appears to be 
functioning fine  

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
here is my setup on ubuntu20.10


▶ nvidia-smi

+-+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|---+--+--+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===+==+==|
|   0  GeForce RTX 3090    Off  | :21:00.0  On |                  N/A |
|  0%   38C    P8    19W / 370W |    478MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |
+---+--+--+

/opt/sparkRapidsPlugin                                                          
                                                                                
             
▶ ls
cudf-0.18.1-cuda11.jar  getGpusResources.sh  rapids-4-spark_2.12-0.4.1.jar

▶ scalac --version
Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, 
Inc.


▶ spark-shell --version
2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to a 
loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface wlp71s0)
2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind 
to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor 
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of 
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Welcome to
                    __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/
                        
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10
Branch HEAD
Compiled by user ubuntu on 2021-02-22T01:04:02Z
Revision 1d550c4e90275ab418b9161925049239227f3dc9
Url https://github.com/apache/spark
Type --help for more information.


here is how I calling spark prior to adding the test job 

$SPARK_HOME/bin/spark-shell \
       --master local \
       --num-executors 1 \
       --conf spark.executor.cores=16 \
       --conf spark.rapids.sql.concurrentGpuTasks=1 \
       --driver-memory 10g \
       --conf 
spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}      
 
       --conf spark.rapids.memory.pinnedPool.size=16G \
       --conf spark.locality.wait=0s \
       --conf spark.sql.files.maxPartitionBytes=512m \
       --conf spark.sql.shuffle.partitions=10 \
       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
       --files $SPARK_RAPIDS_DIR/getGpusResources.sh \
       --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}


Test job is from the example join-operation 

val df = sc.makeRDD(1 to 1000, 6).toDF
val df2 = sc.makeRDD(1 to 1000, 6).toDF
df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === 
$"b").count


I just noticed that the scala versions are out of sync - that shouldnt affect 
it?


is there anything else I can try in the 

Re: GPU job in Spark 3

2021-04-09 Thread Sean Owen
I don't see anything in this job that would use a GPU?

On Fri, Apr 9, 2021 at 11:19 AM Martin Somers  wrote:

>
> Hi Everyone !!
>
> Im trying to get on premise GPU instance of Spark 3 running on my ubuntu
> box, and I am following:
>
> https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation
>
> Anyone with any insight into why a spark job isnt being ran on the GPU -
> appears to be all on the CPU, hadoop binary installed and appears to be
> functioning fine
>
> export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
>
> here is my setup on ubuntu20.10
>
>
> ▶ nvidia-smi
>
>
> +-+
> | NVIDIA-SMI 460.39   Driver Version: 460.39   CUDA Version: 11.2
> |
>
> |---+--+--+
> | GPU  NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr.
> ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute
> M. |
> |   |  |   MIG
> M. |
>
> |===+==+==|
> |   0  GeForce RTX 3090Off  | :21:00.0  On |
>  N/A |
> |  0%   38CP819W / 370W |478MiB / 24265MiB |  0%
>  Default |
> |   |  |
>  N/A |
>
> +---+--+--+
>
> /opt/sparkRapidsPlugin
>
>
> ▶ ls
> cudf-0.18.1-cuda11.jar  getGpusResources.sh  rapids-4-spark_2.12-0.4.1.jar
>
> ▶ scalac --version
> Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and
> Lightbend, Inc.
>
>
> ▶ spark-shell --version
> 2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to
> a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface
> wlp71s0)
> 2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to
> bind to another address
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
> (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
> java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of
> org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.1
>   /_/
>
> Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10
> Branch HEAD
> Compiled by user ubuntu on 2021-02-22T01:04:02Z
> Revision 1d550c4e90275ab418b9161925049239227f3dc9
> Url https://github.com/apache/spark
> Type --help for more information.
>
>
> here is how I calling spark prior to adding the test job
>
> $SPARK_HOME/bin/spark-shell \
>--master local \
>--num-executors 1 \
>--conf spark.executor.cores=16 \
>--conf spark.rapids.sql.concurrentGpuTasks=1 \
>--driver-memory 10g \
>--conf
> spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}
>
>--conf spark.rapids.memory.pinnedPool.size=16G \
>--conf spark.locality.wait=0s \
>--conf spark.sql.files.maxPartitionBytes=512m \
>--conf spark.sql.shuffle.partitions=10 \
>--conf spark.plugins=com.nvidia.spark.SQLPlugin \
>--files $SPARK_RAPIDS_DIR/getGpusResources.sh \
>--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
>
>
> Test job is from the example join-operation
>
> val df = sc.makeRDD(1 to 1000, 6).toDF
> val df2 = sc.makeRDD(1 to 1000, 6).toDF
> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" ===
> $"b").count
>
>
> I just noticed that the scala versions are out of sync - that shouldnt
> affect it?
>
>
> is there anything else I can try in the --conf or is there any logs to see
> what might be failing behind the scenes, any suggestions?
>
>
> Thanks
> Martin
>
>
> --
> M
>


GPU job in Spark 3

2021-04-09 Thread Martin Somers
Hi Everyone !!

Im trying to get on premise GPU instance of Spark 3 running on my ubuntu
box, and I am following:
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation

Anyone with any insight into why a spark job isnt being ran on the GPU -
appears to be all on the CPU, hadoop binary installed and appears to be
functioning fine

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

here is my setup on ubuntu20.10


▶ nvidia-smi

+-+
| NVIDIA-SMI 460.39   Driver Version: 460.39   CUDA Version: 11.2
  |
|---+--+--+
| GPU  NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute
M. |
|   |  |   MIG
M. |
|===+==+==|
|   0  GeForce RTX 3090Off  | :21:00.0  On |
 N/A |
|  0%   38CP819W / 370W |478MiB / 24265MiB |  0%
 Default |
|   |  |
 N/A |
+---+--+--+

/opt/sparkRapidsPlugin


▶ ls
cudf-0.18.1-cuda11.jar  getGpusResources.sh  rapids-4-spark_2.12-0.4.1.jar

▶ scalac --version
Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and
Lightbend, Inc.


▶ spark-shell --version
2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to
a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface
wlp71s0)
2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to
bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
  /_/

Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10
Branch HEAD
Compiled by user ubuntu on 2021-02-22T01:04:02Z
Revision 1d550c4e90275ab418b9161925049239227f3dc9
Url https://github.com/apache/spark
Type --help for more information.


here is how I calling spark prior to adding the test job

$SPARK_HOME/bin/spark-shell \
   --master local \
   --num-executors 1 \
   --conf spark.executor.cores=16 \
   --conf spark.rapids.sql.concurrentGpuTasks=1 \
   --driver-memory 10g \
   --conf
spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}

   --conf spark.rapids.memory.pinnedPool.size=16G \
   --conf spark.locality.wait=0s \
   --conf spark.sql.files.maxPartitionBytes=512m \
   --conf spark.sql.shuffle.partitions=10 \
   --conf spark.plugins=com.nvidia.spark.SQLPlugin \
   --files $SPARK_RAPIDS_DIR/getGpusResources.sh \
   --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}


Test job is from the example join-operation

val df = sc.makeRDD(1 to 1000, 6).toDF
val df2 = sc.makeRDD(1 to 1000, 6).toDF
df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" ===
$"b").count


I just noticed that the scala versions are out of sync - that shouldnt
affect it?


is there anything else I can try in the --conf or is there any logs to see
what might be failing behind the scenes, any suggestions?


Thanks
Martin


-- 
M