mridulm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r661178449
##########
File path:
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##########
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
handler.addRpcRequest(requestId, callback);
RpcChannelListener listener = new RpcChannelListener(requestId, callback);
channel.writeAndFlush(
- new MergedBlockMetaRequest(requestId, appId, shuffleId,
reduceId)).addListener(listener);
+ new MergedBlockMetaRequest(requestId, appId, shuffleId,
shuffleSequenceId, reduceId)).addListener(listener);
Review comment:
I did have a theoretical concern regarding pushing blocks actually - not
reading (that, as discussed above, should be fine) : and I had discussed it
with @otterc and @zhouyejoe
The scenario is as follows: Suppose a stage is cancelled/failed and a
reattempt starts - with one or more tasks of the failed stage still running and
pushing data. If ESS "knows" of a new shuffle sequence id (due to some other
task from new stage attempt pushing data), it can reject all pushes for the
'older' sequence ids for the shuffle id.
The current code flow does not prevent the push of old stage task from
completing - but also does not cause any incorrect shuffle reads in child
stages (as earlier stage attempt has failed, it will never 'finalize' that
shuffle output - and so never added to MapOutputTracker as merged location).
But we were debating on whether the added validation while processing push
blocks is worth it (this will require propagating the shuffle seq id to those
requests as well).
Thoughts @Ngone51 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]