[GitHub] [spark] Victsm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

GitBox Fri, 25 Jun 2021 16:06:19 -0700


Victsm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r659076673




##########
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##########
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
     handler.addRpcRequest(requestId, callback);
     RpcChannelListener listener = new RpcChannelListener(requestId, callback);
     channel.writeAndFlush(
-      new MergedBlockMetaRequest(requestId, appId, shuffleId, 
reduceId)).addListener(listener);
+      new MergedBlockMetaRequest(requestId, appId, shuffleId, 
shuffleSequenceId, reduceId)).addListener(listener);

Review comment:
       What about the following scenario:
   1. An indeterminate stage generates the shuffle data for a given shuffle.
   2. Downstream reduce stage experienced shuffle fetch failure, leading to 
retry of the indeterminate stage.
   3. Tasks from the retry of the indeterminate stage start pushing blocks, 
which would lead to invalidating the shuffle data from the 1st attempt.
   4. In the meantime, we might still have dangling tasks from the first failed 
reduce stage trying to fetch shuffle blocks corresponding to the 1st attempt of 
the indeterminate stage.
   
   Is the above scenario possible with indeterminate stage retry, and would we 
run into issues if the seq ID is only used on the push side but not the fetch 
side?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Victsm commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

Reply via email to