mridulm commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r661178449
##########
File path:
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##########
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
handler.addRpcRequest(requestId, callback);
RpcChannelListener listener = new RpcChannelListener(requestId, callback);
channel.writeAndFlush(
- new MergedBlockMetaRequest(requestId, appId, shuffleId,
reduceId)).addListener(listener);
+ new MergedBlockMetaRequest(requestId, appId, shuffleId,
shuffleSequenceId, reduceId)).addListener(listener);
Review comment:
I did have a theoretical concern regarding ~~pushing~~ reading blocks as
code evolves - not reading currently (that, as discussed above, should be fine)
: and I had discussed it with @otterc and @zhouyejoe
The scenario is as follows: Suppose a stage is cancelled/failed and a
reattempt starts - with one or more tasks of the failed stage still running and
~~pushing~~ reading data. If ESS "knows" of a new shuffle sequence id (due to
some other task from new stage attempt pushing data), it can reject all
~~pushes~~ reads for the 'older' sequence ids for the shuffle id.
The current code flow does not prevent the ~~push~~ reads of old stage task
from completing - but also does not cause this incorrect read from affecting
child stages (as earlier stage attempt has failed, it will never 'finalize'
that shuffle output - and so never added to MapOutputTracker as merged
location).
But we were debating on whether the added validation while processing
~~push~~ read blocks is worth it (this will require propagating the shuffle seq
id to those requests as well).
Thoughts @Ngone51 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]