[GitHub] spark pull request: SPARK-1937: fix issue with task locality

lirui-intel Tue, 27 May 2014 18:46:33 -0700

Github user lirui-intel commented on the pull request:

    https://github.com/apache/spark/pull/892#issuecomment-44356508
  
    If I understand, the application cannot control how many executors to 
launch (at least with the standalone mode). New executors can be launched for 
the application whenever the standalone master schedules resources. So I 
suppose it's difficult for the user to determine when he/she gets "enough" 
executors.
    
    Another possible way to solve this issue is that when the TaskScheduler 
finds new executors/hosts in the resource offer, it can inform the 
TaskSetManager to re-compute the pending lists. TaskSetManager currently 
handles event like "executor lost", maybe it should also handle "executor 
added"?
    
    I have another concern about pendingTasksWithNoPrefs in TaskSetManager. 
This list now contains two kinds of tasks: tasks that truly have no preferred 
locations, and tasks whose preferred locations are not available. Tasks in this 
list are considered as PROCESS_LOCAL, which means they take precedence over 
tasks in NODE_LOCAL and RACK_LOCAL lists. Therefore in some scenario (the 
allowed locality is PROCESS_LOCAL), TaskSetManager may prefer tasks whose 
desired location is unavailable, to tasks that can run with some locality. Not 
sure if this behavior is intended?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

Reply via email to