Karthik Kambatla commented on YARN-2313:

Actually, thinking more about it, I don't quite understand how the 
update-thread can go into a busy loop. Thread.sleep() and update are called 
serially. So, irrespective of how long update() takes the next Thread.sleep is 
called for 500 ms, no? 

It is possible that these 500 ms are not enough for other work and the 
scheduler lags, but should still make progress. 

> Livelock can occur in FairScheduler when there are lots of running apps
> -----------------------------------------------------------------------
>                 Key: YARN-2313
>                 URL: https://issues.apache.org/jira/browse/YARN-2313
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.4.1
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>             Fix For: 2.6.0
>         Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
> YARN-2313.4.patch, rm-stack-trace.txt
> Observed livelock on FairScheduler when there are lots entry in queue. After 
> my investigating code, following case can occur:
> 1. {{update()}} called by UpdateThread takes longer times than 
> UPDATE_INTERVAL(500ms) if there are lots queue.
> 2. UpdateThread goes busy loop.
> 3. Other threads(AllocationFileReloader, 
> ResourceManager$SchedulerEventDispatcher) can wait forever.

This message was sent by Atlassian JIRA

Reply via email to