Victsm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r659076673
##########
File path:
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##########
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
handler.addRpcRequest(requestId, callback);
RpcChannelListener listener = new RpcChannelListener(requestId, callback);
channel.writeAndFlush(
- new MergedBlockMetaRequest(requestId, appId, shuffleId,
reduceId)).addListener(listener);
+ new MergedBlockMetaRequest(requestId, appId, shuffleId,
shuffleSequenceId, reduceId)).addListener(listener);
Review comment:
What about the following scenario:
1. An indeterminate stage generates the shuffle data for a given shuffle.
2. Downstream reduce stage experienced shuffle fetch failure, leading to
retry of the indeterminate stage.
3. Tasks from the retry of the indeterminate stage start pushing blocks,
which would lead to invalidating the shuffle data from the 1st attempt.
4. In the meantime, we might still have dangling tasks from the first failed
reduce stage trying to fetch shuffle blocks corresponding to the 1st attempt of
the indeterminate stage.
Is the above scenario possible with indeterminate stage retry, and would we
run into issues if the seq ID is only used on the push side but not the fetch
side?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]