Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214315
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager(
def isExecutorIdle(executorId: String): Boolean = {
!executorIdToTaskIds.contains(executorId)
}
+
+ /**
+ * Get the number of locality aware pending tasks and related locality
preferences as the
+ * hints used for executor allocation.
+ */
+ def executorPlacementHints(): (Int, Map[String, Int]) =
+ allocationManager.synchronized {
+ var localityAwarePendingTasks: Int = 0
+ val localityToCount = new mutable.HashMap[String, Int]()
+ stageIdToPreferredLocations.values.foreach { localities =>
--- End diff --
Given that the values in stageIdToPreferredLocations don't change after the
stage is submitted, it seems like we're going to be redoing a bunch of work
each time we sync with the YarnAllocator. Can we save intermediate information
to avoid this?
Also, the way this works, we don't actually differentiate between pending,
running, and completed tasks. All tasks for a stage are counted as pending as
long as the stage is still running. How computationally expensive would it be
to keep this up to date when tasks start and complete?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]