liupc opened a new pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix 
application failed due to failed to get MapStatuses broadcast block
URL: https://github.com/apache/spark/pull/27604
 
 
   
   
   ### What changes were proposed in this pull request?
   
   As described in 
[SPARK-30849](https://issues.apache.org/jira/browse/SPARK-30849), spark 
application will sometimes failed due to failed to get mapStatuses broadcast 
block. 
   ```
   Job aborted due to stage failure: Task 18 in stage 2.0 failed 4 times, most 
recent failure: Lost task 18.3 in stage 2.0 (TID 13819, xxxx , executor 8): 
java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_9_piece1 of broadcast_9
   java.io.IOException: org.apache.spark.SparkException: Failed to get 
broadcast_9_piece1 of broadcast_9
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1287)
        at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
        at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
        at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
        at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at 
org.apache.spark.MapOutputTracker$$anonfun$deserializeMapStatuses$1.apply(MapOutputTracker.scala:775)
        at 
org.apache.spark.MapOutputTracker$$anonfun$deserializeMapStatuses$1.apply(MapOutputTracker.scala:775)
        at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
        at 
org.apache.spark.MapOutputTracker$.logInfo(MapOutputTracker.scala:712)
        at 
org.apache.spark.MapOutputTracker$.deserializeMapStatuses(MapOutputTracker.scala:774)
        at 
org.apache.spark.MapOutputTrackerWorker.getStatuses(MapOutputTracker.scala:665)
        at 
org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:603)
        at 
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:57)
        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:109)
   ```
   This is caused by the mapStatuses broadcast id is sent to executor, but was 
invalidated immediately by the driver before the real fetching of the broadcast.
   
   This PR will try to fix this issue.
   
   
   ### Why are the changes needed?
   Bugfix
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   UT
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to