otterc commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r656803962
##########
File path:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -156,26 +157,31 @@ private AppShufflePartitionInfo
getOrCreateAppShufflePartitionInfo(
@VisibleForTesting
AppShufflePartitionInfo newAppShufflePartitionInfo(
AppShuffleId appShuffleId,
+ int shuffleSequenceId,
int reduceId,
File dataFile,
File indexFile,
File metaFile) throws IOException {
- return new AppShufflePartitionInfo(appShuffleId, reduceId, dataFile,
+ return new AppShufflePartitionInfo(appShuffleId, shuffleSequenceId,
reduceId, dataFile,
new MergeShuffleFile(indexFile), new MergeShuffleFile(metaFile));
}
@Override
- public MergedBlockMeta getMergedBlockMeta(String appId, int shuffleId, int
reduceId) {
+ public MergedBlockMeta getMergedBlockMeta(
+ String appId,
+ int shuffleId,
+ int shuffleSequenceId,
+ int reduceId) {
AppShuffleId appShuffleId = new AppShuffleId(appId, shuffleId);
- File indexFile = getMergedShuffleIndexFile(appShuffleId, reduceId);
+ File indexFile = getMergedShuffleIndexFile(appShuffleId,
shuffleSequenceId, reduceId);
Review comment:
> Do you mean to keep track of this information in addition to the
existing information we have on shuffle service side?
Yes, instead of modifying the fetch protocols, it should rather be tracked
on the server or inferred from existing information (if possible) by the server.
> Even then we wouldn't still know whether it is finalized or not right?
If a partition is not finalized, then a shuffle fetcher will never request
to read the shuffle data of that partition. The driver will not have the merge
status for that partition.
This comes back to my original point, if the fetch side is always reading
the data of latest shuffleSequenceId then these protocols should not me
modified.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]