GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/3779
[SPARK-4939] move to next locality when no pending tasks for executors
Currently, if there are different locality in a task set, the tasks with
NODE_LOCAL only get scheduled after all the PROCESS_LOCAL tasks are scheduled
and timeout with spark.locality.wait.process (3 seconds by default). In local
mode, the LocalScheduler will never call resourceOffer() again once it failed
to get a task with same locality, then all the NODE_LOCAL tasks will be never
scheduled.
This bug could be reproduced by run example
python/streaming/stateful_network_wordcount.py, it will hang after finished a
batch with some data.
This patch will try to remove the `execId` from `pendingTasksForExecutor`
if no more tasks for this executor, it's cheched once for each launched
PROCESS_LOCAL tasks. Then we can check whether there are PROCESS_LOCAL tasks or
not by call `pendingTasksForExecutor.isEmpty`, which is cheap. Finally, we can
change to next locality level once there is no more PROCESS_LOCAL tasks without
waiting for `spark.locality.wait.process` seconds.
The tasks in pendingTasksForHost and pendingTasksForRack are the same set
of tasks, so we don't need this logic for them.
cc @tdas
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark local_streaming
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3779.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3779
----
commit 7d8c5a5e83fb2516babf4c3fd99e8392edffadb6
Author: Davies Liu <[email protected]>
Date: 2014-12-23T19:49:04Z
jump to next locality if no pending tasks for executors
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]