Hi All, I'm using one spark cluster cluster that contains 50 nodes from type i3.4xl (16Vcores). I'm trying to run 4 Spark SQL queries simultaneously.
The data is split to 10 even partitions and the 4 queries run on the same data,but different partition. I have tried to configure the cluster so each job will get the same resources and won't interfere with the other jobs resources. When running with 1/2 queries simultaneously I got much better performance then the 4 queries. Although I expected to get the same performance. I'm looking for your advice on how to improve the performance by tuning the configurations. I have a total of 15*50 nodes 5 executors per instance max-executers 37 shuffle partition 750 ... >From what I understand when setting 37 max executors when running 1,2,3,4 jobs in parallel they will have the same executors number, thus the same running time.. Thanks, Tzahi