[jira] [Resolved] (SPARK-17667) Make locking fine grained in YarnAllocator#enqueueGetLossReasonRequest

Marcelo Vanzin (JIRA) Tue, 12 Feb 2019 13:43:52 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marcelo Vanzin resolved SPARK-17667.
------------------------------------
    Resolution: Won't Fix

PR was abandoned. Let's close this one.

> Make locking fine grained in YarnAllocator#enqueueGetLossReasonRequest
> ----------------------------------------------------------------------
>
>                 Key: SPARK-17667
>                 URL: https://issues.apache.org/jira/browse/SPARK-17667
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: Ashwin Shankar
>            Priority: Major
>
> Following up on the discussion in SPARK-15725, one of the reason for AM 
> hanging with dynamic allocation(DA) is the way locking is done in 
> YarnAllocator. We noticed that when executors go down during the shrink phase 
> of DA, AM gets locked up. On taking thread dump, we see threads trying to get 
> loss for reason via YarnAllocator#enqueueGetLossReasonRequest, and they are 
> all BLOCKED waiting for lock acquired by allocate call. This gets worse when 
> the number of executors go down are in the thousands, and I've seen AM hang 
> in the order of minutes. This jira is created to make the locking little more 
> fine grained by remembering the executors that were killed via AM, and then 
> serve the GetExecutorLossReason requests with that information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17667) Make locking fine grained in YarnAllocator#enqueueGetLossReasonRequest

Reply via email to