Hi Felipe, I had the same problem - needed to execute multiple jobs/actions multithreaded, with slightly different sql configs per job (mainly spark.sql.shuffle.partitions). I'm not sure if this is the best solution, but I ended up using newSession() per thread. It works well except for the new SparkSession does not contain custom configurations from the original session. I had to re-apply the important configurations (catalogs, etc.) on the new Sessions as well. Hope that helps.
Shay ________________________________ From: Saurabh Gulati <saurabh.gul...@fedex.com.INVALID> Sent: Wednesday, January 4, 2023 11:54 AM To: Felipe Pessoto <felipepess...@hotmail.com>; user@spark.apache.org <user@spark.apache.org> Subject: [EXTERNAL] Re: How to set a config for a single query? ATTENTION: This email originated from outside of GM. Hey Felipe, Since you are collecting the dataframes, you might as well run them separately with desired configs and store them in your storage. Regards Saurabh ________________________________ From: Felipe Pessoto <felipepess...@hotmail.com> Sent: 04 January 2023 01:14 To: user@spark.apache.org <user@spark.apache.org> Subject: [EXTERNAL] How to set a config for a single query? Caution! This email originated outside of FedEx. Please do not open attachments or click links from an unknown or suspicious origin. Hi, In Scala is it possible to set a config value to a single query? I could set/unset the value, but it won’t work for multithreading scenarios. Example: spark.sql.adaptive.coalescePartitions.enabled = false queryA_df.collect spark.sql.adaptive.coalescePartitions.enabled=original value queryB_df.collect queryC_df.collect queryD_df.collect If I execute that block of code multiple times using multiple thread, I can end up executing Query A with coalescePartitions.enabled=true, and Queries B, C and D with the config set to false, because another thread could set it between the executions. Is there any good alternative to this? Thanks.