[ https://issues.apache.org/jira/browse/SPARK-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-17667. ------------------------------------ Resolution: Won't Fix PR was abandoned. Let's close this one. > Make locking fine grained in YarnAllocator#enqueueGetLossReasonRequest > ---------------------------------------------------------------------- > > Key: SPARK-17667 > URL: https://issues.apache.org/jira/browse/SPARK-17667 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.6.2, 2.0.0 > Reporter: Ashwin Shankar > Priority: Major > > Following up on the discussion in SPARK-15725, one of the reason for AM > hanging with dynamic allocation(DA) is the way locking is done in > YarnAllocator. We noticed that when executors go down during the shrink phase > of DA, AM gets locked up. On taking thread dump, we see threads trying to get > loss for reason via YarnAllocator#enqueueGetLossReasonRequest, and they are > all BLOCKED waiting for lock acquired by allocate call. This gets worse when > the number of executors go down are in the thousands, and I've seen AM hang > in the order of minutes. This jira is created to make the locking little more > fine grained by remembering the executors that were killed via AM, and then > serve the GetExecutorLossReason requests with that information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org