Hi Ankit, Thanks for detailed explanation. Since my cluster has 5 machines each of which has 8 cores and 48g memory, I was meant to say for the entire cluster:
(a) gives us 40 workers with each core per worker (b) gives 5 workers while each worker has eight cores. A follow-up question, since each machine has 48g memory, (a) SPARK_WORKER_INSTANCES = 8 SPARK_WORKER_CORES = 1 SPARK_WORKER_MEMORY = 6g (b) SPARK_WORKER_INSTANCES = 1 SPARK_WORKER_CORES = 8 SPARK_WORKER_MEMORY = 48g Will (a) setting help consume large dataset, while as you said each machine has 8 JVMs now? Thanks a lot, -chen On Sun, Jan 26, 2014 at 1:53 AM, Archit Thakur <[email protected]> wrote: > Chen, The first one will launch 8 single threaded JVM's and the 2nd one will > launch 1 8-threaded JVM. > Performance depends on your data: If your data size is too small to be > processed, 2nd one is better because of the launching time of 8 JVM's in > first case. Also, if you have broadcasted anything, it'll have to that for 8 > machines. > However, if you have quite big data to be processed, 1st one is better > because i. In this case you can ignore the launching time of JVM. and ii. > You'll now have 8 times memory available for processing. > Assumption made: All machines are equipped with same memory/computing power. > > > """(a) gives us 40 workers with each core per worker (b) gives 8 workers > while each worker has eight cores. Any advice on which better would > lead to better performance?""" > > No, (a) gives u 8 workers with each core per worker (b) gives 1 worker > > while each worker has eight cores. > > Let me know, if any doubts. > > Thanks and Regards, > Archit Thakur. > > > > On Sun, Jan 26, 2014 at 5:58 AM, Chen Jin <[email protected]> wrote: >> >> Hi all, >> >> From spark document, we can set the number of workers by >> SPARK_WORKER_INSTANCES and the max number of cores that worker can >> take by using SPARK_WORKER_CORES, if I have 5 8-core machine, which >> one would perform better between >> (a) >> SPARK_WORKER_INSTANCES = 8 >> SPARK_WORKER_CORES = 1 >> >> and >> (b) >> SPARK_WORKER_INSTANCES = 1 >> SPARK_WORKER_CORES = 8 >> >> (a) gives us 40 workers with each core per worker (b) gives 8 workers >> while each worker has eight cores. Any advice on which better would >> lead to better performance? >> >> Thanks a lot, >> >> -chen > >
