Hi,

I'm trying to understand how this thing works underneath. Let's say I have
two types of jobs - high important, that might use small amount of cores
and has to be run pretty fast. And less important, but greedy - uses as
many cores as available. So, the idea is to use two corresponding pools.

Then thing I'm trying to understand is the following.
I use standalone spark deployment (no YARN, no Mesos).
Let's say that less important took all the cores and then someone runs high
important job. Then I see three possibilities:
1. Spark kill some executors that currently runs less important partitions
to assign them to a high performant job.
2. Spark will wait until some partitions of less important job will be
completely processed and then first executors that become free will be
assigned to process high important job.
3. Spark will figure out specific time, when particular stages of
partitions of less important jobs is done, and instead of continue with
this job, these executors will be reassigned to high important one.

Which one it is? Could you please point me to a class / method / line of
code?
--
Be well!
Jean Morozov

Reply via email to