Ngone51 commented on a change in pull request #28656:
URL: https://github.com/apache/spark/pull/28656#discussion_r432524274
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -1107,10 +1107,19 @@ private[spark] class TaskSetManager(
def recomputeLocality(): Unit = {
// A zombie TaskSetManager may reach here while executorLost happens
if (isZombie) return
+ val previousLocalityIndex = currentLocalityIndex
val previousLocalityLevel = myLocalityLevels(currentLocalityIndex)
+ val previousMyLocalityLevels = myLocalityLevels
myLocalityLevels = computeValidLocalityLevels()
localityWaits = myLocalityLevels.map(getLocalityWait)
currentLocalityIndex = getLocalityIndex(previousLocalityLevel)
+ if (currentLocalityIndex > previousLocalityIndex) {
+ // SPARK-31837: If the new level is more local, shift to the new most
local locality
+ // level in terms of better data locality. For example, say the previous
locality
+ // levels are [PROCESS, NODE, ANY] and current level is ANY. After
recompute, the
+ // locality levels are [PROCESS, NODE, RACK, ANY]. Then, we'll shift to
RACK level.
+ currentLocalityIndex =
getLocalityIndex(myLocalityLevels.diff(previousMyLocalityLevels).head)
Review comment:
> We can certainly handle some here if you want
What do you mean by "handle some here"? I read your JIRA and don't find the
specific solution that could be added to this PR. Could you please elaborate
more?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]