[ 
https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2526:
-----------------------------------
    Summary: SLS can deadlock when all the threads are taken by AMSimulators  
(was: Scheduler Load Simulator may enter deadlock if lots of applications 
submitted to the RM at the same time)

> SLS can deadlock when all the threads are taken by AMSimulators
> ---------------------------------------------------------------
>
>                 Key: YARN-2526
>                 URL: https://issues.apache.org/jira/browse/YARN-2526
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler-load-simulator
>    Affects Versions: 2.5.1
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>            Priority: Minor
>         Attachments: YARN-2526-1.patch
>
>
> The simulation may enter deadlock if all application simulators hold all 
> threads provided by the thread pool, and all wait for AM container 
> allocation. In that case, all AM simulators wait for NM simulators to do 
> heartbeat to allocate resource, and all NM simulators wait for AM simulators 
> to release some threads. The simulator is deadlocked.
> To solve this deadlock, need to remove the while() loop in the MRAMSimulator.
> {code}
>     // waiting until the AM container is allocated
>     while (true) {
>       if (response != null && ! response.getAllocatedContainers().isEmpty()) {
>         // get AM container
>         .....
>         break;
>       }
>       // this sleep time is different from HeartBeat
>       Thread.sleep(1000);
>       // send out empty request
>       sendContainerRequest();
>       response = responseQueue.take();
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to