Re: How to set a config for a single query?

Shay Elbaz Wed, 04 Jan 2023 02:05:56 -0800

Hi Felipe,

I had the same problem - needed to execute multiple jobs/actions multithreaded, 
with slightly different sql configs per job (mainly 
spark.sql.shuffle.partitions). I'm not sure if this is the best solution, but I 
ended up using newSession() per thread. It works well except for the new 
SparkSession does not contain custom configurations from the original session. 
I had to re-apply the important configurations (catalogs, etc.) on the new 
Sessions as well. Hope that helps.

Shay
________________________________
From: Saurabh Gulati <[email protected]>
Sent: Wednesday, January 4, 2023 11:54 AM
To: Felipe Pessoto <[email protected]>; [email protected] 
<[email protected]>
Subject: [EXTERNAL] Re: How to set a config for a single query?

ATTENTION: This email originated from outside of GM.

Hey Felipe,
Since you are collecting the dataframes, you might as well run them separately 
with desired configs and store them in your storage.

Regards
Saurabh
________________________________
From: Felipe Pessoto <[email protected]>
Sent: 04 January 2023 01:14
To: [email protected] <[email protected]>
Subject: [EXTERNAL] How to set a config for a single query?

Caution! This email originated outside of FedEx. Please do not open attachments 
or click links from an unknown or suspicious origin.

Hi,

In Scala is it possible to set a config value to a single query?

I could set/unset the value, but it won’t work for multithreading scenarios.

Example:

spark.sql.adaptive.coalescePartitions.enabled = false

                queryA_df.collect

spark.sql.adaptive.coalescePartitions.enabled=original value

                queryB_df.collect

                queryC_df.collect

                queryD_df.collect

If I execute that block of code multiple times using multiple thread, I can end 
up executing Query A with coalescePartitions.enabled=true, and Queries B, C and 
D with the config set to false, because another thread could set it between the 
executions.

Is there any good alternative to this?

Thanks.

Re: How to set a config for a single query?

Reply via email to