Re: GPU job in Spark 3
Hi, completely agree with Hao. In case you are using YARN try to see the EMR documentation on how to enable GPU as resource in YARN before trying to use that in SPARK. This is one of the most exciting features of SPARK 3, and you can reap huge benefits out of it :) Regards, Gourav Sengupta On Fri, Apr 9, 2021 at 6:10 PM HaoZ wrote: > Hi Martin, > > I tested the local mode in Spark on Rapids Accelerator and it works fine > for > me. > The only possible issue is the CUDA 11.2 however the supported CUDA version > as per https://nvidia.github.io/spark-rapids/docs/download.html is 11.0. > > Here is a quick test using Spark local mode. > Note: When I was testing this local mode, I make sure there is nothing in > spark-defaults.conf so everything is clean. > > == > scala> val df = sc.makeRDD(1 to 100, 6).toDF > df: org.apache.spark.sql.DataFrame = [value: int] > > scala> val df2 = sc.makeRDD(1 to 100, 6).toDF > df2: org.apache.spark.sql.DataFrame = [value: int] > > scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" > === $"b").count > res0: Long = 100 > scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" > === $"b").explain() > == Physical Plan == > GpuColumnarToRow false > +- GpuShuffledHashJoin [a#29], [b#31], Inner, GpuBuildRight, false >:- GpuShuffleCoalesce 2147483647 >: +- GpuColumnarExchange gpuhashpartitioning(a#29, 10), > ENSURE_REQUIREMENTS, [id=#221] >: +- GpuProject [value#2 AS a#29] >:+- GpuRowToColumnar TargetSize(2147483647) >: +- *(1) SerializeFromObject [input[0, int, false] AS > value#2] >: +- Scan[obj#1] >+- GpuCoalesceBatches RequireSingleBatch > +- GpuShuffleCoalesce 2147483647 > +- GpuColumnarExchange gpuhashpartitioning(b#31, 10), > ENSURE_REQUIREMENTS, [id=#228] > +- GpuProject [value#8 AS b#31] >+- GpuRowToColumnar TargetSize(2147483647) > +- *(2) SerializeFromObject [input[0, int, false] AS > value#8] > +- Scan[obj#7] > == > > Thanks, > Hao > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: GPU job in Spark 3
(I apologize, I totally missed that this should use GPUs because of RAPIDS. Ignore my previous. But yeah it's more a RAPIDS question.) On Fri, Apr 9, 2021 at 12:09 PM HaoZ wrote: > Hi Martin, > > I tested the local mode in Spark on Rapids Accelerator and it works fine > for > me. > The only possible issue is the CUDA 11.2 however the supported CUDA version > as per https://nvidia.github.io/spark-rapids/docs/download.html is 11.0. > > Here is a quick test using Spark local mode. > Note: When I was testing this local mode, I make sure there is nothing in > spark-defaults.conf so everything is clean. > > == > scala> val df = sc.makeRDD(1 to 100, 6).toDF > df: org.apache.spark.sql.DataFrame = [value: int] > > scala> val df2 = sc.makeRDD(1 to 100, 6).toDF > df2: org.apache.spark.sql.DataFrame = [value: int] > > scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" > === $"b").count > res0: Long = 100 > scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" > === $"b").explain() > == Physical Plan == > GpuColumnarToRow false > +- GpuShuffledHashJoin [a#29], [b#31], Inner, GpuBuildRight, false >:- GpuShuffleCoalesce 2147483647 >: +- GpuColumnarExchange gpuhashpartitioning(a#29, 10), > ENSURE_REQUIREMENTS, [id=#221] >: +- GpuProject [value#2 AS a#29] >:+- GpuRowToColumnar TargetSize(2147483647) >: +- *(1) SerializeFromObject [input[0, int, false] AS > value#2] >: +- Scan[obj#1] >+- GpuCoalesceBatches RequireSingleBatch > +- GpuShuffleCoalesce 2147483647 > +- GpuColumnarExchange gpuhashpartitioning(b#31, 10), > ENSURE_REQUIREMENTS, [id=#228] > +- GpuProject [value#8 AS b#31] >+- GpuRowToColumnar TargetSize(2147483647) > +- *(2) SerializeFromObject [input[0, int, false] AS > value#8] > +- Scan[obj#7] > == > > Thanks, > Hao > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: GPU job in Spark 3
Hi Martin, I tested the local mode in Spark on Rapids Accelerator and it works fine for me. The only possible issue is the CUDA 11.2 however the supported CUDA version as per https://nvidia.github.io/spark-rapids/docs/download.html is 11.0. Here is a quick test using Spark local mode. Note: When I was testing this local mode, I make sure there is nothing in spark-defaults.conf so everything is clean. == scala> val df = sc.makeRDD(1 to 100, 6).toDF df: org.apache.spark.sql.DataFrame = [value: int] scala> val df2 = sc.makeRDD(1 to 100, 6).toDF df2: org.apache.spark.sql.DataFrame = [value: int] scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === $"b").count res0: Long = 100 scala> df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === $"b").explain() == Physical Plan == GpuColumnarToRow false +- GpuShuffledHashJoin [a#29], [b#31], Inner, GpuBuildRight, false :- GpuShuffleCoalesce 2147483647 : +- GpuColumnarExchange gpuhashpartitioning(a#29, 10), ENSURE_REQUIREMENTS, [id=#221] : +- GpuProject [value#2 AS a#29] :+- GpuRowToColumnar TargetSize(2147483647) : +- *(1) SerializeFromObject [input[0, int, false] AS value#2] : +- Scan[obj#1] +- GpuCoalesceBatches RequireSingleBatch +- GpuShuffleCoalesce 2147483647 +- GpuColumnarExchange gpuhashpartitioning(b#31, 10), ENSURE_REQUIREMENTS, [id=#228] +- GpuProject [value#8 AS b#31] +- GpuRowToColumnar TargetSize(2147483647) +- *(2) SerializeFromObject [input[0, int, false] AS value#8] +- Scan[obj#7] == Thanks, Hao -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: GPU job in Spark 3
Hey Martin, I would encourage you to file issues in the spark-rapids repo for questions with that plugin: https://github.com/NVIDIA/spark-rapids/issues I'm assuming the query ran and you looked at the sql UI or the .expalin() output and it was on cpu and not gpu? I am assuming you have the cuda 11.0 runtime installed (look in /usr/local). You printed the driver version which is 11.2 but the runtimes can be different. You are using the 11.0 cuda version of the cudf library. If that didn't match runtime though it would have failed and not ran anything. The easiest way to tell why it didn't run on the GPU is to enable the config: spark.rapids.sql.explain=NOT_ON_GPU It will print out logs to your console as to why different operators don't run on the gpu. Again feel free to open up a question issues in the spark-rapids repo and we can discuss more there. Tom On Friday, April 9, 2021, 11:19:05 AM CDT, Martin Somers wrote: Hi Everyone !! Im trying to get on premise GPU instance of Spark 3 running on my ubuntu box, and I am following: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation Anyone with any insight into why a spark job isnt being ran on the GPU - appears to be all on the CPU, hadoop binary installed and appears to be functioning fine export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath) here is my setup on ubuntu20.10 ▶ nvidia-smi +-+ | NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 | |---+--+--+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===+==+==| | 0 GeForce RTX 3090 Off | :21:00.0 On | N/A | | 0% 38C P8 19W / 370W | 478MiB / 24265MiB | 0% Default | | | | N/A | +---+--+--+ /opt/sparkRapidsPlugin ▶ ls cudf-0.18.1-cuda11.jar getGpusResources.sh rapids-4-spark_2.12-0.4.1.jar ▶ scalac --version Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc. ▶ spark-shell --version 2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface wlp71s0) 2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10 Branch HEAD Compiled by user ubuntu on 2021-02-22T01:04:02Z Revision 1d550c4e90275ab418b9161925049239227f3dc9 Url https://github.com/apache/spark Type --help for more information. here is how I calling spark prior to adding the test job $SPARK_HOME/bin/spark-shell \ --master local \ --num-executors 1 \ --conf spark.executor.cores=16 \ --conf spark.rapids.sql.concurrentGpuTasks=1 \ --driver-memory 10g \ --conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} --conf spark.rapids.memory.pinnedPool.size=16G \ --conf spark.locality.wait=0s \ --conf spark.sql.files.maxPartitionBytes=512m \ --conf spark.sql.shuffle.partitions=10 \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --files $SPARK_RAPIDS_DIR/getGpusResources.sh \ --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR} Test job is from the example join-operation val df = sc.makeRDD(1 to 1000, 6).toDF val df2 = sc.makeRDD(1 to 1000, 6).toDF df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === $"b").count I just noticed that the scala versions are out of sync - that shouldnt affect it? is there anything else I can try in the
Re: GPU job in Spark 3
I don't see anything in this job that would use a GPU? On Fri, Apr 9, 2021 at 11:19 AM Martin Somers wrote: > > Hi Everyone !! > > Im trying to get on premise GPU instance of Spark 3 running on my ubuntu > box, and I am following: > > https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation > > Anyone with any insight into why a spark job isnt being ran on the GPU - > appears to be all on the CPU, hadoop binary installed and appears to be > functioning fine > > export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath) > > here is my setup on ubuntu20.10 > > > ▶ nvidia-smi > > > +-+ > | NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 > | > > |---+--+--+ > | GPU NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. > ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute > M. | > | | | MIG > M. | > > |===+==+==| > | 0 GeForce RTX 3090Off | :21:00.0 On | > N/A | > | 0% 38CP819W / 370W |478MiB / 24265MiB | 0% > Default | > | | | > N/A | > > +---+--+--+ > > /opt/sparkRapidsPlugin > > > ▶ ls > cudf-0.18.1-cuda11.jar getGpusResources.sh rapids-4-spark_2.12-0.4.1.jar > > ▶ scalac --version > Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and > Lightbend, Inc. > > > ▶ spark-shell --version > 2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to > a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface > wlp71s0) > 2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to > bind to another address > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform > (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor > java.nio.DirectByteBuffer(long,int) > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 3.1.1 > /_/ > > Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10 > Branch HEAD > Compiled by user ubuntu on 2021-02-22T01:04:02Z > Revision 1d550c4e90275ab418b9161925049239227f3dc9 > Url https://github.com/apache/spark > Type --help for more information. > > > here is how I calling spark prior to adding the test job > > $SPARK_HOME/bin/spark-shell \ >--master local \ >--num-executors 1 \ >--conf spark.executor.cores=16 \ >--conf spark.rapids.sql.concurrentGpuTasks=1 \ >--driver-memory 10g \ >--conf > spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} > >--conf spark.rapids.memory.pinnedPool.size=16G \ >--conf spark.locality.wait=0s \ >--conf spark.sql.files.maxPartitionBytes=512m \ >--conf spark.sql.shuffle.partitions=10 \ >--conf spark.plugins=com.nvidia.spark.SQLPlugin \ >--files $SPARK_RAPIDS_DIR/getGpusResources.sh \ >--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR} > > > Test job is from the example join-operation > > val df = sc.makeRDD(1 to 1000, 6).toDF > val df2 = sc.makeRDD(1 to 1000, 6).toDF > df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === > $"b").count > > > I just noticed that the scala versions are out of sync - that shouldnt > affect it? > > > is there anything else I can try in the --conf or is there any logs to see > what might be failing behind the scenes, any suggestions? > > > Thanks > Martin > > > -- > M >
GPU job in Spark 3
Hi Everyone !! Im trying to get on premise GPU instance of Spark 3 running on my ubuntu box, and I am following: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#example-join-operation Anyone with any insight into why a spark job isnt being ran on the GPU - appears to be all on the CPU, hadoop binary installed and appears to be functioning fine export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath) here is my setup on ubuntu20.10 ▶ nvidia-smi +-+ | NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 | |---+--+--+ | GPU NamePersistence-M| Bus-IdDisp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===+==+==| | 0 GeForce RTX 3090Off | :21:00.0 On | N/A | | 0% 38CP819W / 370W |478MiB / 24265MiB | 0% Default | | | | N/A | +---+--+--+ /opt/sparkRapidsPlugin ▶ ls cudf-0.18.1-cuda11.jar getGpusResources.sh rapids-4-spark_2.12-0.4.1.jar ▶ scalac --version Scala compiler version 2.13.0 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc. ▶ spark-shell --version 2021-04-09 17:05:36,158 WARN util.Utils: Your hostname, studio resolves to a loopback address: 127.0.1.1; using 192.168.0.221 instead (on interface wlp71s0) 2021-04-09 17:05:36,159 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 /_/ Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10 Branch HEAD Compiled by user ubuntu on 2021-02-22T01:04:02Z Revision 1d550c4e90275ab418b9161925049239227f3dc9 Url https://github.com/apache/spark Type --help for more information. here is how I calling spark prior to adding the test job $SPARK_HOME/bin/spark-shell \ --master local \ --num-executors 1 \ --conf spark.executor.cores=16 \ --conf spark.rapids.sql.concurrentGpuTasks=1 \ --driver-memory 10g \ --conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} --conf spark.rapids.memory.pinnedPool.size=16G \ --conf spark.locality.wait=0s \ --conf spark.sql.files.maxPartitionBytes=512m \ --conf spark.sql.shuffle.partitions=10 \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --files $SPARK_RAPIDS_DIR/getGpusResources.sh \ --jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR} Test job is from the example join-operation val df = sc.makeRDD(1 to 1000, 6).toDF val df2 = sc.makeRDD(1 to 1000, 6).toDF df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a" === $"b").count I just noticed that the scala versions are out of sync - that shouldnt affect it? is there anything else I can try in the --conf or is there any logs to see what might be failing behind the scenes, any suggestions? Thanks Martin -- M