Ngone51 commented on a change in pull request #28656:
URL: https://github.com/apache/spark/pull/28656#discussion_r432318441
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -1107,10 +1107,19 @@ private[spark] class TaskSetManager(
def recomputeLocality(): Unit = {
// A zombie TaskSetManager may reach here while executorLost happens
if (isZombie) return
+ val previousLocalityIndex = currentLocalityIndex
val previousLocalityLevel = myLocalityLevels(currentLocalityIndex)
+ val previousMyLocalityLevels = myLocalityLevels
myLocalityLevels = computeValidLocalityLevels()
localityWaits = myLocalityLevels.map(getLocalityWait)
currentLocalityIndex = getLocalityIndex(previousLocalityLevel)
+ if (currentLocalityIndex > previousLocalityIndex) {
+ // SPARK-31837: If the new level is more local, shift to the new most
local locality
+ // level in terms of better data locality. For example, say the previous
locality
+ // levels are [PROCESS, NODE, ANY] and current level is ANY. After
recompute, the
+ // locality levels are [PROCESS, NODE, RACK, ANY]. Then, we'll shift to
RACK level.
+ currentLocalityIndex =
getLocalityIndex(myLocalityLevels.diff(previousMyLocalityLevels).head)
Review comment:
Hi all, there's a defect in the previous implement(always reset
`currentLocalityIndex` to 0). Think about such a case, say we have locality
levels [PROCESS, NODE, ANY] and current locality level is ANY. After recompute,
we might have locality levels [PROCESS, NODE, RACK, ANY]. In this case, I think
we'd better shit to RACK level instead of PROCESS level, since the
TaskSetManager has been already delayed for a while on known levels(PROCESS,
NODE). So with this update, I think it could also ease our concern on the
possible perf regression introduced by aggressive locality level resetting.
@bmarcott @tgravescs @cloud-fan
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]