AFAIK there is no way to set it at each Exchange.
If you are using Spark 3.2+ AQE Performance Tuning - Spark 3.3.0 Documentation 
(apache.org)<https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution>
 is enabled by default, which can automatically changes post shuffle partition 
number on each exchange which can help your use case. This feature is available 
from Spark 3.

From: Anupam Singh <anupamsngh...@gmail.com>
Sent: Saturday, September 10, 2022 10:23 PM
To: Vibhor Gupta <vibhor.gu...@walmart.com.invalid>
Cc: user@spark.apache.org
Subject: [EXTERNAL] Re: Dynamic shuffle partitions in a single job

You don't often get email from 
anupamsngh...@gmail.com<mailto:anupamsngh...@gmail.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Commenting for better reach :)

On Thu, Sep 8, 2022, 11:56 AM Vibhor Gupta 
<vibhor.gu...@walmart.com.invalid<mailto:vibhor.gu...@walmart.com.invalid>> 
wrote:
Hi Community,

Is it possible to set no of shuffle partitions per exchange ?

My spark query contains a lot of joins/aggregations involving big tables and 
small tables, so keeping a high value of spark.sql.shuffle.partitions helps 
with big tables, for small tables it creates a lot of overhead on the scheduler.

Cheers,
Vibhor

Reply via email to