Re: [Spark Optimization] Why is one node getting all the pressure?

Jörn Franke Mon, 11 Jun 2018 03:48:18 -0700

If it is in kB then spark will always schedule it to one node. As soon as it 
gets bigger you will see usage of more nodes.


Hence increase your testing Dataset .

> On 11. Jun 2018, at 12:22, Aakash Basu <aakash.spark....@gmail.com> wrote:
> 
> Jorn - The code is a series of feature engineering and model tuning 
> operations. Too big to show. Yes, data volume is too low, it is in KBs, just 
> tried to experiment with a small dataset before going for a large one.
> 
> Akshay - I ran with your suggested spark configurations, I get this (the node 
> changed, but the problem persists) -
> 
> <image.png>
> 
> 
> 
>> On Mon, Jun 11, 2018 at 3:16 PM, akshay naidu <akshaynaid...@gmail.com> 
>> wrote:
>> try
>>  --num-executors 3 --executor-cores 4 --executor-memory 2G --conf 
>> spark.scheduler.mode=FAIR
>> 
>>> On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu <aakash.spark....@gmail.com> 
>>> wrote:
>>> Hi,
>>> 
>>> I have submitted a job on 4 node cluster, where I see, most of the 
>>> operations happening at one of the worker nodes and other two are simply 
>>> chilling out.
>>> 
>>> Picture below puts light on that -
>>> 
>>> How to properly distribute the load?
>>> 
>>> My cluster conf (4 node cluster [1 driver; 3 slaves]) -
>>> 
>>> Cores - 6
>>> RAM - 12 GB
>>> HDD - 60 GB
>>> 
>>> My Spark Submit command is as follows -
>>> 
>>> spark-submit --master spark://192.168.49.37:7077 --num-executors 3 
>>> --executor-cores 5 --executor-memory 4G 
>>> /appdata/bblite-codebase/prima_diabetes_indians.py
>>> 
>>> What to do?
>>> 
>>> Thanks,
>>> Aakash.
>> 
>

Re: [Spark Optimization] Why is one node getting all the pressure?

Reply via email to