If it is in kB then spark will always schedule it to one node. As soon as it gets bigger you will see usage of more nodes.
Hence increase your testing Dataset . > On 11. Jun 2018, at 12:22, Aakash Basu <aakash.spark....@gmail.com> wrote: > > Jorn - The code is a series of feature engineering and model tuning > operations. Too big to show. Yes, data volume is too low, it is in KBs, just > tried to experiment with a small dataset before going for a large one. > > Akshay - I ran with your suggested spark configurations, I get this (the node > changed, but the problem persists) - > > <image.png> > > > >> On Mon, Jun 11, 2018 at 3:16 PM, akshay naidu <akshaynaid...@gmail.com> >> wrote: >> try >> --num-executors 3 --executor-cores 4 --executor-memory 2G --conf >> spark.scheduler.mode=FAIR >> >>> On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu <aakash.spark....@gmail.com> >>> wrote: >>> Hi, >>> >>> I have submitted a job on 4 node cluster, where I see, most of the >>> operations happening at one of the worker nodes and other two are simply >>> chilling out. >>> >>> Picture below puts light on that - >>> >>> How to properly distribute the load? >>> >>> My cluster conf (4 node cluster [1 driver; 3 slaves]) - >>> >>> Cores - 6 >>> RAM - 12 GB >>> HDD - 60 GB >>> >>> My Spark Submit command is as follows - >>> >>> spark-submit --master spark://192.168.49.37:7077 --num-executors 3 >>> --executor-cores 5 --executor-memory 4G >>> /appdata/bblite-codebase/prima_diabetes_indians.py >>> >>> What to do? >>> >>> Thanks, >>> Aakash. >> >