Aakash, Like Jorn suggested, did you increase your test data set? If so, did you also update your executor-memory setting? It seems like you might exceeding the executor memory threshold.
Thanks Vamshi Talla Sent from my iPhone On Jun 11, 2018, at 8:54 AM, Aakash Basu <aakash.spark....@gmail.com<mailto:aakash.spark....@gmail.com>> wrote: Hi Jorn/Others, Thanks for your help. Now, data is being distributed in a proper way, but the challenge is, after a certain point, I'm getting this error, after which, everything stops moving ahead - 2018-06-11 18:14:56 ERROR TaskSchedulerImpl:70 - Lost executor 0 on 192.168.49.39<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.49.39&data=02%7C01%7C%7Cdc9886e0d4be43fdf0cb08d5cf9a6fda%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636643184560233393&sdata=T0QyzG2Sufk0kktKK3U2BVsAszvhCzx%2FFNnXOxpiWPs%3D&reserved=0>: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. <image.png> How to avoid this scenario? Thanks, Aakash. On Mon, Jun 11, 2018 at 4:16 PM, Jörn Franke <jornfra...@gmail.com<mailto:jornfra...@gmail.com>> wrote: If it is in kB then spark will always schedule it to one node. As soon as it gets bigger you will see usage of more nodes. Hence increase your testing Dataset . On 11. Jun 2018, at 12:22, Aakash Basu <aakash.spark....@gmail.com<mailto:aakash.spark....@gmail.com>> wrote: Jorn - The code is a series of feature engineering and model tuning operations. Too big to show. Yes, data volume is too low, it is in KBs, just tried to experiment with a small dataset before going for a large one. Akshay - I ran with your suggested spark configurations, I get this (the node changed, but the problem persists) - <image.png> On Mon, Jun 11, 2018 at 3:16 PM, akshay naidu <akshaynaid...@gmail.com<mailto:akshaynaid...@gmail.com>> wrote: try --num-executors 3 --executor-cores 4 --executor-memory 2G --conf spark.scheduler.mode=FAIR On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu <aakash.spark....@gmail.com<mailto:aakash.spark....@gmail.com>> wrote: Hi, I have submitted a job on 4 node cluster, where I see, most of the operations happening at one of the worker nodes and other two are simply chilling out. Picture below puts light on that - [cid:] How to properly distribute the load? My cluster conf (4 node cluster [1 driver; 3 slaves]) - Cores - 6 RAM - 12 GB HDD - 60 GB My Spark Submit command is as follows - spark-submit --master spark://192.168.49.37:7077<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.49.37%3A7077&data=02%7C01%7C%7Cdc9886e0d4be43fdf0cb08d5cf9a6fda%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636643184560233393&sdata=wS4drWE7%2FAJFXoUL3w0OzIRNL54RLKRTeMUBB%2BY1B28%3D&reserved=0> --num-executors 3 --executor-cores 5 --executor-memory 4G /appdata/bblite-codebase/prima_diabetes_indians.py What to do? Thanks, Aakash.