Thanks Artemis. We are not using Rapids, but rather using GPUs through the 
Stage Level Scheduling feature with ResourceProfile. In Kubernetes you have to 
turn on shuffle tracking for dynamic allocation, anyhow.
The question is how we can limit the number of executors when building a new 
ResourceProfile, directly (API) or indirectly (some advanced workaround).

Thanks,
Shay


________________________________
From: Artemis User <arte...@dtechspace.com>
Sent: Thursday, November 3, 2022 1:16 AM
To: user@spark.apache.org <user@spark.apache.org>
Subject: [EXTERNAL] Re: Stage level scheduling - lower the number of executors 
when using GPUs


ATTENTION: This email originated from outside of GM.

  Are you using Rapids for GPU support in Spark?  Couple of options you may 
want to try:

  1.  In addition to dynamic allocation turned on, you may also need to turn on 
external shuffling service.
  2.  Sounds like you are using Kubernetes.  In that case, you may also need to 
turn on shuffle tracking.
  3.  The "stages" are controlled by the APIs.  The APIs for dynamic resource 
request (change of stage) do exist, but only for RDDs (e.g. TaskResourceRequest 
and ExecutorResourceRequest).

On 11/2/22 11:30 AM, Shay Elbaz wrote:
Hi,

Our typical applications need less executors for a GPU stage than for a CPU 
stage. We are using dynamic allocation with stage level scheduling, and Spark 
tries to maximize the number of executors also during the GPU stage, causing a 
bit of resources chaos in the cluster. This forces us to use a lower value for 
'maxExecutors' in the first place, at the cost of the CPU stages performance. 
Or try to solve this in the Kubernets scheduler level, which is not 
straightforward and doesn't feel like the right way to go.

Is there a way to effectively use less executors in Stage Level Scheduling? The 
API does not seem to include such an option, but maybe there is some more 
advanced workaround?

Thanks,
Shay






Reply via email to