Victsm commented on a change in pull request #29855:
URL: https://github.com/apache/spark/pull/29855#discussion_r497077157
##########
File path:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -373,6 +427,54 @@ public ManagedBuffer next() {
}
}
+ /**
+ * Dummy implementation of merged shuffle file manager. Suitable for when
push-based shuffle
+ * is not enabled.
+ */
+ private static class NoOpMergedShuffleFileManager implements
MergedShuffleFileManager {
+
+ @Override
+ public StreamCallbackWithID receiveBlockDataAsStream(PushBlockStream msg) {
+ throw new UnsupportedOperationException("Cannot handle shuffle block
merge");
+ }
+
+ @Override
+ public MergeStatuses finalizeShuffleMerge(FinalizeShuffleMerge msg) throws
IOException {
+ throw new UnsupportedOperationException("Cannot handle shuffle block
merge");
+ }
+
+ @Override
+ public void registerApplication(String appId, String user) {
Review comment:
It's a bit unclear at this moment, especially on that part of what's
needed in different schedulers.
Our current approach for determining the merged shuffle file directory path
is the following:
1. The implementation of MergedShuffleFileManager (RPC handler for block
push requests) will be initialized with a relative directory path pattern,
which is relative to the list of executor local dirs (a common concept across
all schedulers).
2. The actual path for storing the merged shuffle files for a given
application on a given host is then decided based on the local dirs and the
materialization of the relative path pattern with the appId and user ID.
The assumption is that once we know the local dirs for a given app, the
remaining portion of the directory path to the merged shuffle files will be
mostly the same across different applications except the app Id and the user Id
portion in the path.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]