Chen, The first one will launch 8 single threaded JVM's and the 2nd one
will launch 1 8-threaded JVM.
Performance depends on your data: If your data size is too small to be
processed, 2nd one is better because of the launching time of 8 JVM's in
first case. Also, if you have broadcasted anything, it'll have to that for
8 machines.
However, if you have quite big data to be processed, 1st one is better
because i. In this case you can ignore the launching time of JVM. and ii.
You'll now have 8 times memory available for processing.
Assumption made: All machines are equipped with same memory/computing power.

"""(a) gives us 40 workers with each core per worker (b) gives 8 workers
while each worker has eight cores. Any advice on which better would
lead to better performance?"""

No, (a) gives u 8 workers with each core per worker (b) gives 1 worker
while each worker has eight cores.

Let me know, if any doubts.

Thanks and Regards,
Archit Thakur.



On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <[email protected]> wrote:

> Hi all,
>
> From spark document, we can set the number of workers by
> SPARK_WORKER_INSTANCES and the max number of cores that worker can
> take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which
> one would perform better between
> (a)
>    SPARK_WORKER_INSTANCES = 8
>    SPARK_WORKER_CORES = 1
>
> and
> (b)
>    SPARK_WORKER_INSTANCES = 1
>    SPARK_WORKER_CORES = 8
>
> (a) gives us 40 workers with each core per worker (b) gives 8 workers
> while each worker has eight cores. Any advice on which better would
> lead to better performance?
>
> Thanks a lot,
>
> -chen
>

Reply via email to