Now I see what you want to do. If you have access to the cluster
configuration files, you can modify the spark-env.sh file on the worker
nodes to specify exactly which node you'd like to link with GPU cores
and which one not. This would allow only those nodes configured with
GPU-resources getting scheduled/acquired for your GPU tasks (see Rapids
user guide at
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html).
We are using Rapids in our on-prem Spark environment with complete
control of OS, file and network systems, containers and even
hardware/GPU settings. I guess you are using one of the cloud services
so I am not sure if you have access to the low-level cluster config on
EMR or GCP, which gave you a cookie-cutter type of cluster settings with
limited configurability. But under the hood, I believe they do use
Nvidia Rapids which currently is the only option for GPU acceleration in
Spark (Spark 3.x.x distribution package doesn't include Rapids or any
GPU integration libs). So you may want to dive into the Rapids
instructions for more configuration and usage info (it does provide
detailed instructions on how to run Rapids on EMR, Databricks and GCP).
On 11/3/22 12:10 PM, Shay Elbaz wrote:
Thanks again Artemis, I really appreciate it. I have watched the video
but did not find an answer.
Please bear with me just one more iteration 🙂
Maybe I'll be more specific:
Suppose I start the application with maxExecutors=500,
executors.cores=2, because that's the amount of resources needed for
the ETL part. But for the DL part I only need 20 GPUs. SLS API only
allows to set the resources per executor/task, so Spark would (try to)
allocate up to 500 GPUs, assuming I configure the profile with 1 GPU
per executor.
So, the question is how do I limit the stage resources to 20 GPUs total?
Thanks again,
Shay
------------------------------------------------------------------------
*From:* Artemis User <arte...@dtechspace.com>
*Sent:* Thursday, November 3, 2022 5:23 PM
*To:* user@spark.apache.org <user@spark.apache.org>
*Subject:* [EXTERNAL] Re: Re: Stage level scheduling - lower the
number of executors when using GPUs
*ATTENTION:*This email originated from outside of GM.
Shay, You may find this video helpful (with some API code samples
that you are looking for).
https://www.youtube.com/watch?v=JNQu-226wUc&t=171s
<https://www.youtube.com/watch?v=JNQu-226wUc&t=171s>. The issue here
isn't how to limit the number of executors but to request for the
right GPU-enabled executors dynamically. Those executors used in
pre-GPU stages should be returned back to resource managers with
dynamic resource allocation enabled (and with the right DRA
policies). Hope this helps..
Unfortunately there isn't a lot of detailed docs for this topic since
GPU acceleration is kind of new in Spark (not straightforward like in
TF). I wish the Spark doc team could provide more details in the
next release...
On 11/3/22 2:37 AM, Shay Elbaz wrote:
Thanks Artemis. We are *not* using Rapids, but rather using GPUs
through the Stage Level Scheduling feature with ResourceProfile. In
Kubernetes you have to turn on shuffle tracking for dynamic
allocation, anyhow.
The question is how we can limit the *number of executors *when
building a new ResourceProfile, directly (API) or indirectly (some
advanced workaround).
Thanks,
Shay
------------------------------------------------------------------------
*From:* Artemis User <arte...@dtechspace.com>
<mailto:arte...@dtechspace.com>
*Sent:* Thursday, November 3, 2022 1:16 AM
*To:* user@spark.apache.org <mailto:user@spark.apache.org>
<user@spark.apache.org> <mailto:user@spark.apache.org>
*Subject:* [EXTERNAL] Re: Stage level scheduling - lower the number
of executors when using GPUs
*ATTENTION:*This email originated from outside of GM.
Are you using Rapids for GPU support in Spark? Couple of options you
may want to try:
1. In addition to dynamic allocation turned on, you may also need to
turn on external shuffling service.
2. Sounds like you are using Kubernetes. In that case, you may also
need to turn on shuffle tracking.
3. The "stages" are controlled by the APIs. The APIs for dynamic
resource request (change of stage) do exist, but only for RDDs
(e.g. TaskResourceRequest and ExecutorResourceRequest).
On 11/2/22 11:30 AM, Shay Elbaz wrote:
Hi,
Our typical applications need less *executors* for a GPU stage than
for a CPU stage. We are using dynamic allocation with stage level
scheduling, and Spark tries to maximize the number of executors also
during the GPU stage, causing a bit of resources chaos in the
cluster. This forces us to use a lower value for 'maxExecutors' in
the first place, at the cost of the CPU stages performance. Or try
to solve this in the Kubernets scheduler level, which is not
straightforward and doesn't feel like the right way to go.
Is there a way to effectively use less executors in Stage Level
Scheduling? The API does not seem to include such an option, but
maybe there is some more advanced workaround?
Thanks,
Shay