pgandhi999 commented on a change in pull request #22774:
[SPARK-25780][CORE]Scheduling the tasks which have no higher level locality
first
URL: https://github.com/apache/spark/pull/22774#discussion_r243750189
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -397,7 +397,22 @@ private[spark] class TaskSetManager(
}
if (TaskLocality.isAllowed(maxLocality, TaskLocality.NODE_LOCAL)) {
- for (index <- dequeueTaskFromList(execId, host,
getPendingTasksForHost(host))) {
+ val tasksForHost = getPendingTasksForHost(host)
+ var tasksForExecutor = new ArrayBuffer[Int]
+ for (index <- tasksForHost) {
+ val allTasksForExecutor = pendingTasksForExecutor.valuesIterator
+ while (allTasksForExecutor.hasNext) {
Review comment:
Rather than having a nested for loop, is it possible if we first loop
through `allTasksForExecutor` and create a HashSet of all indexes as keys, then
simply loop through `tasksForHost` and check whether the HashSet contains the
particular index or not before adding it to `tasksForExecutor.` This will
improve the running time from O(n^2) to O(n).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]