mccheah commented on a change in pull request #25304:
[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API.
URL: https://github.com/apache/spark/pull/25304#discussion_r316652821
##########
File path:
core/src/main/java/org/apache/spark/shuffle/api/ShuffleExecutorComponents.java
##########
@@ -39,17 +40,39 @@
/**
* Called once per map task to create a writer that will be responsible for
persisting all the
* partitioned bytes written by that map task.
- * @param shuffleId Unique identifier for the shuffle the map task is a
part of
+ *
+ * @param shuffleId Unique identifier for the shuffle the map task is a part
of
* @param mapId Within the shuffle, the identifier of the map task
* @param mapTaskAttemptId Identifier of the task attempt. Multiple attempts
of the same map task
- * with the same (shuffleId, mapId) pair can be
distinguished by the
- * different values of mapTaskAttemptId.
+ * with the same (shuffleId, mapId) pair can be
distinguished by the
+ * different values of mapTaskAttemptId.
* @param numPartitions The number of partitions that will be written by the
map task. Some of
-* these partitions may be empty.
+ * these partitions may be empty.
*/
ShuffleMapOutputWriter createMapOutputWriter(
int shuffleId,
int mapId,
long mapTaskAttemptId,
int numPartitions) throws IOException;
+
+ /**
+ * An optional extension for creating a map output writer that can optimize
the transfer of a
+ * single partition file, as the entire result of a map task, to the backing
store.
+ * <p>
+ * Most implementations should return the default {@link Optional#empty()}
to indicate that
+ * they do not support this optimization. This primarily is for
backwards-compatibility in
Review comment:
Truth be told, even plugins that support remote FS move would unlikely be
able to support this well - one would still have to transfer the whole file up
to the remote storage layer, but that could just as easily be done by writing
the data from the file through an output stream.
I think only implementations that stage the files locally could support this
in any meaningful way at all.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]