hiboyang commented on a change in pull request #31876:
URL: https://github.com/apache/spark/pull/31876#discussion_r618097182



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -162,7 +162,7 @@ private class ShuffleStatus(numPartitions: Int) extends 
Logging {
    */
   def removeOutputsOnHost(host: String): Unit = withWriteLock {
     logDebug(s"Removing outputs for host ${host}")
-    removeOutputsByFilter(x => x.asInstanceOf[BlockManagerId].host == host)
+    removeOutputsByFilter(x => x.asInstanceOf[HostLocation].host == host)

Review comment:
       `removeOutputsOnHost`/`removeOutputsOnExecutor` may be called by other 
Spark internal code and the caller may not know or cannot make sure the 
underlying `location` is `HostLocation`/`ExecutorLocation`. 
   
   For example in Remote/Disaggregated Shuffle Service, the location could be a 
custom class (not `HostLocation`/`ExecutorLocation`) since the shuffle data is 
on another location (shuffle server or remote storage). When there is shuffle 
`FetchFailed` failure, such failure will trigger code to call 
`removeOutputsOnHost`/`removeOutputsOnExecutor`. Then `asInstanceOf[...]` here 
will throw exception.
   
   Adding `isInstanceOf[...]` in the filter predicate will make the code more 
generic here to support Remote/Disaggregated Shuffle Service.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to