If it is in kB then spark will always schedule it to one node. As soon as it 
gets bigger you will see usage of more nodes.

Hence increase your testing Dataset .

> On 11. Jun 2018, at 12:22, Aakash Basu <aakash.spark....@gmail.com> wrote:
> 
> Jorn - The code is a series of feature engineering and model tuning 
> operations. Too big to show. Yes, data volume is too low, it is in KBs, just 
> tried to experiment with a small dataset before going for a large one.
> 
> Akshay - I ran with your suggested spark configurations, I get this (the node 
> changed, but the problem persists) -
> 
> <image.png>
> 
> 
> 
>> On Mon, Jun 11, 2018 at 3:16 PM, akshay naidu <akshaynaid...@gmail.com> 
>> wrote:
>> try
>>  --num-executors 3 --executor-cores 4 --executor-memory 2G --conf 
>> spark.scheduler.mode=FAIR
>> 
>>> On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu <aakash.spark....@gmail.com> 
>>> wrote:
>>> Hi,
>>> 
>>> I have submitted a job on 4 node cluster, where I see, most of the 
>>> operations happening at one of the worker nodes and other two are simply 
>>> chilling out.
>>> 
>>> Picture below puts light on that -
>>> 
>>> How to properly distribute the load?
>>> 
>>> My cluster conf (4 node cluster [1 driver; 3 slaves]) -
>>> 
>>> Cores - 6
>>> RAM - 12 GB
>>> HDD - 60 GB
>>> 
>>> My Spark Submit command is as follows -
>>> 
>>> spark-submit --master spark://192.168.49.37:7077 --num-executors 3 
>>> --executor-cores 5 --executor-memory 4G 
>>> /appdata/bblite-codebase/prima_diabetes_indians.py
>>> 
>>> What to do?
>>> 
>>> Thanks,
>>> Aakash.
>> 
> 

Reply via email to