hiboyang commented on a change in pull request #31876:
URL: https://github.com/apache/spark/pull/31876#discussion_r618097182
##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -162,7 +162,7 @@ private class ShuffleStatus(numPartitions: Int) extends
Logging {
*/
def removeOutputsOnHost(host: String): Unit = withWriteLock {
logDebug(s"Removing outputs for host ${host}")
- removeOutputsByFilter(x => x.asInstanceOf[BlockManagerId].host == host)
+ removeOutputsByFilter(x => x.asInstanceOf[HostLocation].host == host)
Review comment:
`removeOutputsOnHost`/`removeOutputsOnExecutor` may be called by other
Spark internal code and the caller may not know or cannot make sure the
underlying `location` is `HostLocation`/`ExecutorLocation`.
For example in Remote/Disaggregated Shuffle Service, the location could be a
custom class (not `HostLocation`/`ExecutorLocation`) since the shuffle data is
on another location (shuffle server or remote storage). When there is shuffle
`FetchFailed` failure, such failure will trigger code to call
`removeOutputsOnHost`/`removeOutputsOnExecutor`. Then `asInstanceOf[...]` here
will throw exception.
Adding `isInstanceOf[...]` in the filter predicate will make the code more
generic here to support Remote/Disaggregated Shuffle Service.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]