Karthik Kambatla commented on YARN-2313:

Thanks for the explanation, [~ozawa]. I see the issue clearly now. 

In that case, a better approach might be to have a single "maintenance" thread 
that periodically executes a bunch of runnables (reload, update, 
continuous-scheduling) serially. Otherwise, as we add more threads that hold 
onto the scheduler lock, it will be hairy to tune all of them so the scheduler 
can make some meaningful progress. 

> Livelock can occur in FairScheduler when there are lots of running apps
> -----------------------------------------------------------------------
>                 Key: YARN-2313
>                 URL: https://issues.apache.org/jira/browse/YARN-2313
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.4.1
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>             Fix For: 2.6.0
>         Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
> YARN-2313.4.patch, rm-stack-trace.txt
> Observed livelock on FairScheduler when there are lots entry in queue. After 
> my investigating code, following case can occur:
> 1. {{update()}} called by UpdateThread takes longer times than 
> UPDATE_INTERVAL(500ms) if there are lots queue.
> 2. UpdateThread goes busy loop.
> 3. Other threads(AllocationFileReloader, 
> ResourceManager$SchedulerEventDispatcher) can wait forever.

This message was sent by Atlassian JIRA

Reply via email to