Hello all,

I am new to Spark and have been working on a small project trying to tackle
the straggler problems. I ran some SQL queries (GROUPBY) on a small cluster
and observed that some tasks take several minutes while others finish in
seconds.

I know that Spark already has speculation mode but I still see this problem
with speculative mode turned on. Therefore, I modified the code to kill
those stragglers instead of re-executing them, trading accuracy for speed.
As expected, killing stragglers will cause system hang due to the lost
tasks. Can anyone give some guidance on getting this to work? Is it
possible to early terminate some tasks without affecting the overall
execution of the job, with some cost of accuracy?

Appreciate your help!

-- 
Jia Zhan

Reply via email to