xuanyuanking commented on a change in pull request #26095: 
[SPARK-29435][Core]Shuffle is not working when 
spark.shuffle.useOldFetchProtocol=true
URL: https://github.com/apache/spark/pull/26095#discussion_r334270268
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
 ##########
 @@ -905,15 +905,8 @@ private[spark] object MapOutputTracker extends Logging {
         for (part <- startPartition until endPartition) {
           val size = status.getSizeForBlock(part)
           if (size != 0) {
-            if (useOldFetchProtocol) {
 
 Review comment:
   Thanks for the report and fix! The root cause here is while we set 
`useOldFetchProtocol` here, the shuffle id in the reader side and the writer 
side are inconsistency.
   But we can't fix like this, because while `useOldFetchProtocl=false`, we'll 
use the old version of fetching protocol `OpenBlocks`, which consider map id is 
int and will directly parse the string into Integer. See here: 
https://github.com/apache/spark/blob/148cd26799c69ab9cfdc2b3b8000a194c12518b8/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java#L296
   
   So the right way I think is doing the fix in `ShuffleWriteProcessor`, we 
should fill mapId with `mapTaskId` or `mapIndex` denpending on config 
`spark.shuffle.useOldFetchProtocol`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to