This may be due to data skew On Thu, Jun 16, 2016 at 12:45 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote:
> This SO question was asked about 1yr ago. > > http://stackoverflow.com/questions/31799755/how-to-deal-with-tasks-running-too-long-comparing-to-others-in-job-in-yarn-cli > > I answered this question with a suggestion to try speculation but it > doesn't quite do what the OP expects. I have been running into this issue > more these days. Out of 5000 tasks, 4950 completes in 5mins but the last 50 > never really completes, have tried waiting for 4hrs. This can be a memory > issue or maybe the way spark's fine grained mode works with mesos, I am > trying to enable jmxsink to get a heap dump. > > But in the mean time, is there a better fix for this? (in any version of > spark, I am using 1.5.1 but can upgrade). It would be great if the last 50 > tasks in my example can be killed (timed out) and the stage completes > successfully. > > -- > Thanks, > -Utkarsh > -- Best Regards Jeff Zhang