Splitting resource in Spark cluster

Tzahi File Sun, 29 Dec 2019 14:03:51 -0800

Hi All,

I'm using one spark cluster cluster that contains 50 nodes from type i3.4xl
(16Vcores).
I'm trying to run 4 Spark SQL queries simultaneously.


The data is split to 10 even partitions and the 4 queries run on the same
data,but different partition. I have tried to configure the cluster so each
job will get the same resources and won't interfere with the other jobs
resources.
When running with 1/2 queries simultaneously I got much better performance
then the 4 queries.
Although I expected to get the same performance.

I'm looking for your advice on how to improve the performance by tuning the
configurations.

I have a total of 15*50 nodes
5 executors per instance
max-executers 37
shuffle partition 750
...

>From what I understand when setting 37 max executors when running 1,2,3,4
jobs in parallel they will have the same executors number, thus the same
running time..


Thanks,
Tzahi

Splitting resource in Spark cluster

Reply via email to