Hi all,

I have pyspark sql script with loading of one table 80mb and one is 2 mb
and rest 3 are small tables performing lots of joins in the script to fetch
the data.

My system configuration is

4 nodes,300 GB,64 cores

To write a data frame into table 24Mb size records . System is taking 4
minutes 2 sec. with parameters

Driver memory -5G
Executor memory-20 G
Executor cores 5
Number of executors 40
Dynamicalloction.minexecutors 40
Max executors 40
Dynamic initial executors 17
Memory overhead 4G

With default partition 200

Could you please any one suggest me how can I tune this code.

-- 
k.Lakshmi Nivedita

Reply via email to