gaoyajun02 commented on code in PR #38333:
URL: https://github.com/apache/spark/pull/38333#discussion_r1012466373
##########
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:
##########
@@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator(
// since the last call.
val msg = s"Received a zero-size buffer for block $blockId from
$address " +
s"(expectedApproxSize = $size,
isNetworkReqDone=$isNetworkReqDone)"
- throwFetchFailedException(blockId, mapIndex, address, new
IOException(msg))
+ if (blockId.isShuffleChunk) {
+ logWarning(msg)
+
pushBasedFetchHelper.initiateFallbackFetchForPushMergedBlock(blockId, address)
Review Comment:
did you mean PushMergedRemoteMetaFetchResult?
The size of push-merged block is not zero, since the size of each chunk
cannot be obtained on the reduce side, we print the zero-size log in the
following code on the server side, and confirm that the indexFile has the same
offset continuously.
https://github.com/apache/spark/blob/9a7596e1dde0f1dd596aa6d3b2efbcb5d1ef70ea/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala#L500
Then according to the hardware layer error information, we basically
determine that the problem of data loss occurs in the process of writing data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]