mridulm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r661178449



##########
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##########
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
     handler.addRpcRequest(requestId, callback);
     RpcChannelListener listener = new RpcChannelListener(requestId, callback);
     channel.writeAndFlush(
-      new MergedBlockMetaRequest(requestId, appId, shuffleId, 
reduceId)).addListener(listener);
+      new MergedBlockMetaRequest(requestId, appId, shuffleId, 
shuffleSequenceId, reduceId)).addListener(listener);

Review comment:
       I did have a theoretical concern regarding pushing blocks actually - not 
reading (that, as discussed above, should be fine) : and I had discussed it 
with @otterc and @zhouyejoe 
   
   The scenario is as follows: Suppose a stage is cancelled/failed and a 
reattempt starts - with one or more tasks of the failed stage still running and 
pushing data. If ESS "knows" of a new shuffle sequence id (due to some other 
task from new stage attempt pushing data), it can reject all pushes for the 
'older' sequence ids for the shuffle id.
   
   The current code flow does not prevent the push of old stage task from 
completing - but also does not cause any incorrect shuffle reads in child 
stages (as earlier stage attempt has failed, it will never 'finalize' that 
shuffle output - and so never added to MapOutputTracker as merged location).
   But we were debating on whether the added validation while processing push 
blocks is worth it (this will require propagating the shuffle seq id to those 
requests as well).
   
   Thoughts @Ngone51 ?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to