[GitHub] [spark] tgravescs commented on a change in pull request #28656: [SPARK-31837][CORE] Shift to the new highest locality level if there is when recomputeLocality

GitBox Fri, 29 May 2020 07:20:39 -0700


tgravescs commented on a change in pull request #28656:
URL: https://github.com/apache/spark/pull/28656#discussion_r432515370




##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -1107,10 +1107,19 @@ private[spark] class TaskSetManager(
   def recomputeLocality(): Unit = {
     // A zombie TaskSetManager may reach here while executorLost happens
     if (isZombie) return
+    val previousLocalityIndex = currentLocalityIndex
     val previousLocalityLevel = myLocalityLevels(currentLocalityIndex)
+    val previousMyLocalityLevels = myLocalityLevels
     myLocalityLevels = computeValidLocalityLevels()
     localityWaits = myLocalityLevels.map(getLocalityWait)
     currentLocalityIndex = getLocalityIndex(previousLocalityLevel)
+    if (currentLocalityIndex > previousLocalityIndex) {
+      // SPARK-31837: If the new level is more local, shift to the new most 
local locality
+      // level in terms of better data locality. For example, say the previous 
locality
+      // levels are [PROCESS, NODE, ANY] and current level is ANY. After 
recompute, the
+      // locality levels are [PROCESS, NODE, RACK, ANY]. Then, we'll shift to 
RACK level.
+      currentLocalityIndex = 
getLocalityIndex(myLocalityLevels.diff(previousMyLocalityLevels).head)

Review comment:
       yes this is one of the cases I was referring to. Ideally you would never 
run into this case because a host is on a rack so you would always have it. 
Unfortunately Spark defaults the rack to None so you can. I was going to 
improve upon it in the jira I filed. We can certainly handle some here if you 
want




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on a change in pull request #28656: [SPARK-31837][CORE] Shift to the new highest locality level if there is when recomputeLocality

Reply via email to