[ https://issues.apache.org/jira/browse/YARN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128354#comment-14128354 ]
Hudson commented on YARN-2526: ------------------------------ FAILURE: Integrated in Hadoop-Yarn-trunk #676 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/676/]) YARN-2526. SLS can deadlock when all the threads are taken by AMSimulators. (Wei Yan via kasha) (kasha: rev 28d99db99236ff2a6e4a605802820e2b512225f9) * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java * hadoop-yarn-project/CHANGES.txt > SLS can deadlock when all the threads are taken by AMSimulators > --------------------------------------------------------------- > > Key: YARN-2526 > URL: https://issues.apache.org/jira/browse/YARN-2526 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator > Affects Versions: 2.5.1 > Reporter: Wei Yan > Assignee: Wei Yan > Priority: Critical > Fix For: 2.6.0 > > Attachments: YARN-2526-1.patch > > > The simulation may enter deadlock if all application simulators hold all > threads provided by the thread pool, and all wait for AM container > allocation. In that case, all AM simulators wait for NM simulators to do > heartbeat to allocate resource, and all NM simulators wait for AM simulators > to release some threads. The simulator is deadlocked. > To solve this deadlock, need to remove the while() loop in the MRAMSimulator. > {code} > // waiting until the AM container is allocated > while (true) { > if (response != null && ! response.getAllocatedContainers().isEmpty()) { > // get AM container > ..... > break; > } > // this sleep time is different from HeartBeat > Thread.sleep(1000); > // send out empty request > sendContainerRequest(); > response = responseQueue.take(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)