Re: Spark with GPU

2023-02-07 Thread Alessandro Bellina
For Apache Spark a stand-alone worker can manage all the resources of the
box, including all GPUs. So a spark worker could be set up to manage N gpus
in the box via *spark.worker.resource.gpu.amount,* and then
*spark.executor.resource.gpu.amount, *as provided by on app submit, assigns
GPU resources to executors as they come up. Here is a getting started guide
for spark-rapids but I am not sure if that's what you are looking to use.
Either way, it may help with the resource setup:
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#spark-standalone-cluster
.

Not every node in the cluster needs to have GPUs. You could request 0 GPUs
for an app (default value of spark.executor.resource.gpu.amount), and the
executors will not require this resource.

If you are using a yarn/k8s cluster there are other configs to pay
attention to. If you need help with those let us know.

On Sun, Feb 5, 2023 at 1:50 PM Jack Goodson  wrote:

> As far as I understand you will need a GPU for each worker node or you
> will need to partition the GPU processing somehow to each node which I
> think would defeat the purpose. In Databricks for example when you select
> GPU workers there is a GPU allocated to each worker. I assume this is the
> “correct” approach to this problem
>
> On Mon, 6 Feb 2023 at 8:17 AM, Mich Talebzadeh 
> wrote:
>
>> if you have several nodes with only one node having GPUs, you still have
>> to wait for the result set to complete. In other words it will be as fast
>> as the lowest denominator ..
>>
>> my postulation
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 5 Feb 2023 at 13:38, Irene Markelic  wrote:
>>
>>> Hello,
>>>
>>> has anyone used spark with GPUs? I wonder if every worker node in a
>>> cluster needs one GPU or if you can have several worker nodes of which
>>> only one has a GPU.
>>>
>>> Thank you!
>>>
>>>
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>


Re: Error using SPARK with Rapid GPU

2022-11-30 Thread Alessandro Bellina
Vajiha filed a spark-rapids discussion here
https://github.com/NVIDIA/spark-rapids/discussions/7205, so if you are
interested please follow there.

On Wed, Nov 30, 2022 at 7:17 AM Vajiha Begum S A <
vajihabegu...@maestrowiz.com> wrote:

> Hi,
> I'm using an Ubuntu system with the NVIDIA Quadro K1200 with GPU memory
> 20GB
> Installed - CUDF 22.10.0 jar file, Rapid 4 Spark 2.12-22.10.0 jar file,
> CUDA Toolkit 11.8.0 Linux version., JAVA 8
> I'm running only single server, Master is localhost
>
> I'm trying to run pyspark code through spark submit & Python idle. I'm
> getting errors. Kindly help me to resolve this error.
> Kindly give suggestions where I have made mistakes.
>
> *Error when running code through spark-submit:*
>spark-submit /home/mwadmin/Documents/test.py
> 22/11/30 14:59:32 WARN Utils: Your hostname, mwadmin-HP-Z440-Workstation
> resolves to a loopback address: 127.0.1.1; using ***.***.**.** instead (on
> interface eno1)
> 22/11/30 14:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
> another address
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 22/11/30 14:59:32 INFO SparkContext: Running Spark version 3.2.2
> 22/11/30 14:59:32 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 22/11/30 14:59:33 INFO ResourceUtils:
> ==
> 22/11/30 14:59:33 INFO ResourceUtils: No custom resources configured for
> spark.driver.
> 22/11/30 14:59:33 INFO ResourceUtils:
> ==
> 22/11/30 14:59:33 INFO SparkContext: Submitted application: Spark.com
> 22/11/30 14:59:33 INFO ResourceProfile: Default ResourceProfile created,
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor:
> , memory -> name: memory, amount: 1024, script: , vendor: , offHeap ->
> name: offHeap, amount: 0, script: , vendor: , gpu -> name: gpu, amount: 1,
> script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0,
> gpu -> name: gpu, amount: 0.5)
> 22/11/30 14:59:33 INFO ResourceProfile: Limiting resource is cpus at 1
> tasks per executor
> 22/11/30 14:59:33 WARN ResourceUtils: The configuration of resource: gpu
> (exec = 1, task = 0.5/2, runnable tasks = 2) will result in wasted
> resources due to resource cpus limiting the number of runnable tasks per
> executor to: 1. Please adjust your configuration.
> 22/11/30 14:59:33 INFO ResourceProfileManager: Added ResourceProfile id: 0
> 22/11/30 14:59:33 INFO SecurityManager: Changing view acls to: mwadmin
> 22/11/30 14:59:33 INFO SecurityManager: Changing modify acls to: mwadmin
> 22/11/30 14:59:33 INFO SecurityManager: Changing view acls groups to:
> 22/11/30 14:59:33 INFO SecurityManager: Changing modify acls groups to:
> 22/11/30 14:59:33 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(mwadmin);
> groups with view permissions: Set(); users  with modify permissions:
> Set(mwadmin); groups with modify permissions: Set()
> 22/11/30 14:59:33 INFO Utils: Successfully started service 'sparkDriver'
> on port 45883.
> 22/11/30 14:59:33 INFO SparkEnv: Registering MapOutputTracker
> 22/11/30 14:59:33 INFO SparkEnv: Registering BlockManagerMaster
> 22/11/30 14:59:33 INFO BlockManagerMasterEndpoint: Using
> org.apache.spark.storage.DefaultTopologyMapper for getting topology
> information
> 22/11/30 14:59:33 INFO BlockManagerMasterEndpoint:
> BlockManagerMasterEndpoint up
> 22/11/30 14:59:33 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> 22/11/30 14:59:33 INFO DiskBlockManager: Created local directory at
> /tmp/blockmgr-647d2c2a-72e4-402d-aeff-d7460726eb6d
> 22/11/30 14:59:33 INFO MemoryStore: MemoryStore started with capacity
> 366.3 MiB
> 22/11/30 14:59:33 INFO SparkEnv: Registering OutputCommitCoordinator
> 22/11/30 14:59:33 INFO Utils: Successfully started service 'SparkUI' on
> port 4040.
> 22/11/30 14:59:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
> htttp://localhost:4040
> 22/11/30 14:59:33 INFO ShimLoader: Loading shim for Spark version: 3.2.2
> 22/11/30 14:59:33 INFO ShimLoader: Complete Spark build info: 3.2.2,
> https://github.com/apache/spark, HEAD,
> 78a5825fe266c0884d2dd18cbca9625fa258d7f7, 2022-07-11T15:44:21Z
> 22/11/30 14:59:33 INFO ShimLoader: findURLClassLoader found a
> URLClassLoader org.apache.spark.util.MutableURLClassLoader@1530c739
> 22/11/30 14:59:33 INFO ShimLoader: Updating spark classloader
> org.apache.spark.util.MutableURLClassLoader@1530c739 with the URLs:
> jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark3xx-common/,
> jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark322/
> 22/11/30 14:59:33 INFO ShimLoader: Spark classLoader
> org.apache.spark.util.MutableURLClassLoader@1530c739 updated successfully
> 22/11/30 

Re: Spark with GPU

2022-08-13 Thread Alessandro Bellina
This thread may be better suited as a discussion in our Spark plug-in’s
repo:
https://github.com/NVIDIA/spark-rapids/discussions.

Just to answer the questions that were asked so far:

I would recommend checking our documentation for what is supported as of
our latest release (22.06):
https://nvidia.github.io/spark-rapids/docs/supported_ops.html, as we have
quite a bit of support for decimal and also nested types and keep adding
coverage.

For UDFs, if you are willing to rewrite it to use the RAPIDS cuDF API, we
do have support and examples on how to do this, please check out this:
https://nvidia.github.io/spark-rapids/docs/additional-functionality/rapids-udfs.html.
Automatically translating UDFs to GPUs is not easy. We have a Scala UDF to
catalyst transpiler that will be able to handle very simple UDFs where
every operation has a corresponding catalyst expression, that may be worth
checking out:
https://nvidia.github.io/spark-rapids/docs/additional-functionality/udf-to-catalyst-expressions.html.
This transpiler falls back if it can’t translate any part of the UDF.

The plug-in will not fail in case where it can’t run part of a query on the
GPU, it will fall back and run on the CPU for the parts of the query that
are not supported. It will also output what it can’t optimize on the driver
(on .explain), which should help narrow down an expression or exec that
should be looked at further.

There are other resources all linked from here:
https://nvidia.github.io/spark-rapids/ (of interest may be the
Qualification Tool, and our Getting Started guide for different cloud
providers and distros).

I’d say let’s continue this in the discussions or as issues in the
spark-rapids repo if you have further questions or run into issues, as it’s
not specific to Apache Spark.

Thanks!

Alessandro

On Sat, Aug 13, 2022 at 10:53 AM Sean Owen  wrote:

> This isn't a Spark question, but rather a question about whatever Spark
> application you are talking about. RAPIDS?
>
> On Sat, Aug 13, 2022 at 10:35 AM rajat kumar 
> wrote:
>
>> Thanks Sean.
>>
>> Also, I observed that lots of things are not supported in GPU by NVIDIA.
>> E.g. nested types/decimal type/Udfs etc.
>> So, will it use CPU automatically for running those tasks which require
>> nested types or will it run on GPU and fail.
>>
>> Thanks
>> Rajat
>>
>> On Sat, Aug 13, 2022, 18:54 Sean Owen  wrote:
>>
>>> Spark does not use GPUs itself, but tasks you run on Spark can.
>>> The only 'support' there is is for requesting GPUs as resources for
>>> tasks, so it's just a question of resource management. That's in OSS.
>>>
>>> On Sat, Aug 13, 2022 at 8:16 AM rajat kumar 
>>> wrote:
>>>
 Hello,

 I have been hearing about GPU in spark3.

 For batch jobs , will it help to improve GPU performance. Also is GPU
 support available only on Databricks or on cloud based Spark clusters ?

 I am new , if anyone can share insight , it will help

 Thanks
 Rajat

>>>