Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/6662#issuecomment-109503080
  
    @andrewor14  @vanzin I draw a simple call stack, as this:
    
![image](https://cloud.githubusercontent.com/assets/7404824/8017792/df6f4cf8-0c32-11e5-90ff-7192d30b8d3f.png)
    
    If the `doRequestTotalExecutors` logic happened, it reset the total 
executors of the application.
    But there was a prolem: at the monment if other executor also had down, the 
Spark will never pull it up again.
    This simple scenario can reproduce this issue: 
    There are 2 applications and each wants 2 executor, so total 4 cup cores 
wanted(every executor wants one core). But the RM only has 3 cores, so when 
first application(A) gained 2 cores and second applicaiton(B) gained only one 
core waitting A release the cores.
    Then kill one of the A's executor, B will pull up its executor and let A 
wait the resource.
    After the `TimeOut` logic occures in A  then B application has finished its 
job and releases its resource.
    As the expection, A wil push its anohter other executor again but actually 
it will never happen.
    A may be a Streaming application. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to