otterc commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r642024816
##########
File path:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##########
@@ -290,16 +349,64 @@ void deleteExecutorDirs(Path[] dirs) {
}
}
+ /**
+ * Create StreamCallback for invalid push blocks with the specific error
message.
+ * If specific error message is null, this StreamCallback won't throw
exception in client.
+ */
+ private StreamCallbackWithID createCallbackForInvalidPushBlocks(
+ String streamId,
+ String errorMessage) {
+ return new StreamCallbackWithID() {
+ @Override
+ public String getID() {
+ return streamId;
+ }
+
+ @Override
+ public void onData(String streamId, ByteBuffer buf) {
+ // Ignore the requests. It reaches here either when a request is
received after the
+ // shuffle file is finalized or when a request is for a duplicate
block.
+ }
+
+ @Override
+ public void onComplete(String streamId) {
+ if (errorMessage != null) {
+ // Throw an exception here so the block data is drained from channel
and server
+ // responds RpcFailure to the client.
+ throw new RuntimeException(String.format("Block %s %s", streamId,
errorMessage));
+ }
+ // For duplicate block that is received before the shuffle merge
finalizes, the
+ // server should respond success to the client.
+ }
+
+ @Override
+ public void onFailure(String streamId, Throwable cause) {
+ }
+ };
+ }
+
@Override
public StreamCallbackWithID receiveBlockDataAsStream(PushBlockStream msg) {
// Retrieve merged shuffle file metadata
- AppShuffleId appShuffleId = new AppShuffleId(msg.appId, msg.shuffleId);
+ AppAttemptPathsInfo appAttemptPathsInfo =
getAppAttemptPathsInfo(msg.appId);
+ final String streamId = String.format("%s_%d_%d_%d",
+ OneForOneBlockPusher.SHUFFLE_PUSH_BLOCK_PREFIX, msg.shuffleId,
msg.mapIndex, msg.reduceId);
+ AppAttemptShuffleId appAttemptShuffleId =
+ new AppAttemptShuffleId(msg.appId, msg.attemptId, msg.shuffleId);
+ if (appAttemptPathsInfo.attemptId != appAttemptShuffleId.attemptId) {
+ // If this Block belongs to a former application attempt, it is
considered late,
+ // as only the blocks from the current application attempt will be merged
+ // TODO: [SPARK-35548] Client should be updated to handle this error.
+ return createCallbackForInvalidPushBlocks(streamId,
Review comment:
Is there a need to create a valid `StreamCallback` in this case? Can we
just not throw the RuntimeException with the
`NEWER_ATTEMPT_HAS_STARTED_MESSAGE_SUFFIX` when it tries to create the stream?
This will also make the refactoring below which added
`createCallbackForInvalidPushBlocks` unnecessary.
We do throw RuntimeException when `Cannot initialize merged shuffle
partition for appId %s shuffleId %s"reduceId %s` from
`getOrCreateAppShufflePartitionInfo`, so we can immediately fail to create the
stream here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]