gaoyajun02 commented on code in PR #38333:
URL: https://github.com/apache/spark/pull/38333#discussion_r1012466373


##########
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:
##########
@@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator(
             // since the last call.
             val msg = s"Received a zero-size buffer for block $blockId from 
$address " +
               s"(expectedApproxSize = $size, 
isNetworkReqDone=$isNetworkReqDone)"
-            throwFetchFailedException(blockId, mapIndex, address, new 
IOException(msg))
+            if (blockId.isShuffleChunk) {
+              logWarning(msg)
+              
pushBasedFetchHelper.initiateFallbackFetchForPushMergedBlock(blockId, address)

Review Comment:
   did you mean PushMergedRemoteMetaFetchResult? 
   The size of push-merged block is not zero, since the size of each chunk 
cannot be obtained on the reduce side, we print the zero-size log in the 
following code on the server side, and confirm that the indexFile has the same 
offset continuously. 
https://github.com/apache/spark/blob/9a7596e1dde0f1dd596aa6d3b2efbcb5d1ef70ea/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala#L500
   
   Then according to the hardware layer error information, we basically 
determine that the problem of data loss occurs in the process of writing data.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to