AFAIK there is no way to set it at each Exchange. If you are using Spark 3.2+ AQE Performance Tuning - Spark 3.3.0 Documentation (apache.org)<https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution> is enabled by default, which can automatically changes post shuffle partition number on each exchange which can help your use case. This feature is available from Spark 3.
From: Anupam Singh <anupamsngh...@gmail.com> Sent: Saturday, September 10, 2022 10:23 PM To: Vibhor Gupta <vibhor.gu...@walmart.com.invalid> Cc: user@spark.apache.org Subject: [EXTERNAL] Re: Dynamic shuffle partitions in a single job You don't often get email from anupamsngh...@gmail.com<mailto:anupamsngh...@gmail.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Commenting for better reach :) On Thu, Sep 8, 2022, 11:56 AM Vibhor Gupta <vibhor.gu...@walmart.com.invalid<mailto:vibhor.gu...@walmart.com.invalid>> wrote: Hi Community, Is it possible to set no of shuffle partitions per exchange ? My spark query contains a lot of joins/aggregations involving big tables and small tables, so keeping a high value of spark.sql.shuffle.partitions helps with big tables, for small tables it creates a lot of overhead on the scheduler. Cheers, Vibhor