Chen, The first one will launch 8 single threaded JVM's and the 2nd one will launch 1 8-threaded JVM. Performance depends on your data: If your data size is too small to be processed, 2nd one is better because of the launching time of 8 JVM's in first case. Also, if you have broadcasted anything, it'll have to that for 8 machines. However, if you have quite big data to be processed, 1st one is better because i. In this case you can ignore the launching time of JVM. and ii. You'll now have 8 times memory available for processing. Assumption made: All machines are equipped with same memory/computing power.
"""(a) gives us 40 workers with each core per worker (b) gives 8 workers while each worker has eight cores. Any advice on which better would lead to better performance?""" No, (a) gives u 8 workers with each core per worker (b) gives 1 worker while each worker has eight cores. Let me know, if any doubts. Thanks and Regards, Archit Thakur. On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <[email protected]> wrote: > Hi all, > > From spark document, we can set the number of workers by > SPARK_WORKER_INSTANCES and the max number of cores that worker can > take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which > one would perform better between > (a) > SPARK_WORKER_INSTANCES = 8 > SPARK_WORKER_CORES = 1 > > and > (b) > SPARK_WORKER_INSTANCES = 1 > SPARK_WORKER_CORES = 8 > > (a) gives us 40 workers with each core per worker (b) gives 8 workers > while each worker has eight cores. Any advice on which better would > lead to better performance? > > Thanks a lot, > > -chen >
