So I'm not sure I completely follow. Are you asking for a way to change the
limit without having to do the repartition? And your DL software doesn't care
if you got say 30 executors instead of 20? Normally I would expect the number
fo partitions at that point to be 200 (or whatever you set for your shuffle
partitions) unless you are using AQE coalescing partitions functionality and
then it could change. Are you using the latter?
> Normally I try to aim for anything between 30s-5m per task (failure-wise),
> depending on the cluster, its stability, etc. But in this case, individual
> tasks can take 30-60 minutes, if not much more. Any failure during this long
> time is pretty expensive.
Are you saying when you manually do the repartition your DL tasks take 30-60
minutes? so again you want like AQE coalesce partitions to kick in to attempt
to pick partition sizes for your?
Tom
On Thursday, November 3, 2022 at 03:18:07 PM CDT, Shay Elbaz
wrote:
#yiv4404278030 P {margin-top:0;margin-bottom:0;}This is exactly what we ended
up doing! The only drawback I saw with this approach is that the GPU tasks get
pretty big (in terms of data and compute time), and task failures become
expansive. That's why I reached out to the mailing list in the first place 🙂
Normally I try to aim for anything between 30s-5m per task (failure-wise),
depending on the cluster, its stability, etc. But in this case, individual
tasks can take 30-60 minutes, if not much more. Any failure during this long
time is pretty expensive.
ShayFrom: Tom Graves
Sent: Thursday, November 3, 2022 7:56 PM
To: Artemis User ; user@spark.apache.org
; Shay Elbaz
Subject: [EXTERNAL] Re: Re: Re: Stage level scheduling - lower the number of
executors when using GPUs
|
ATTENTION: This email originated from outside of GM.
|
Stage level scheduling does not allow you to change configs right now. This is
something we thought about as follow on but have never implemented. How many
tasks on the DL stage are you running? The typical case is run some etl lots
of tasks... do mapPartitions and then run your DL stuff, before that
mapPartitions you could do a repartition if necessary to get to exactly the
number of tasks you want (20). That way even if maxExecutors=500 you will only
ever need 20 or whatever you repartition to and spark isn't going to ask for
more then that.
Tom
On Thursday, November 3, 2022 at 11:10:31 AM CDT, Shay Elbaz
wrote:
Thanks again Artemis, I really appreciate it.
I have watched the video but did not find an answer.
Please bear with me just one more iteration 🙂
Maybe I'll be more specific:Suppose I start the application with
maxExecutors=500, executors.cores=2, because that's the amount of resources
needed for the ETL part. But for the DL part I only need 20 GPUs. SLS API only
allows to set the resources per executor/task, so Spark would (try to) allocate
up to 500 GPUs, assuming I configure the profile with 1 GPU per executor. So,
the question is how do I limit the stage resources to 20 GPUs total?
Thanks again,Shay
From: Artemis User
Sent: Thursday, November 3, 2022 5:23 PM
To: user@spark.apache.org
Subject: [EXTERNAL] Re: Re: Stage level scheduling - lower the number of
executors when using GPUs
|
ATTENTION: This email originated from outside of GM.
|
Shay, You may find this video helpful (with some API code samples that you
are looking for). https://www.youtube.com/watch?v=JNQu-226wUc&t=171s. The
issue here isn't how to limit the number of executors but to request for the
right GPU-enabled executors dynamically. Those executors used in pre-GPU
stages should be returned back to resource managers with dynamic resource
allocation enabled (and with the right DRA policies). Hope this helps..
Unfortunately there isn't a lot of detailed docs for this topic since GPU
acceleration is kind of new in Spark (not straightforward like in TF). I wish
the Spark doc team could provide more details in the next release...
On 11/3/22 2:37 AM, Shay Elbaz wrote:
Thanks Artemis. We are not using Rapids,
but rather using GPUs through the Stage Level Scheduling feature with
ResourceProfile. In Kubernetes you have to turn on shuffle tracking for dynamic
allocation, anyhow.The question is how we can limit thenumber of executors when
building a new ResourceProfile, directly (API) or indirectly (some advanced
workaround).
Thanks,Shay
From: Artemis User
Sent: Thursday, November 3, 2022 1:16 AM
To: user@spark.apache.org
Subject: [EXTERNAL] Re: Stage level scheduling - lower the number of executors
when using GPUs
|
ATTENTION: This email originated from outside of GM.
|
Are you using Rapids for GPU support in Spark? Couple of options you may
want to try:
- In addition to dynamic allocation turned on, you may also need to turn on
external shuffling service.
- Sounds like you are using Kubernetes. In that case, you may also need to
turn on shuffle tracking