mccheah commented on a change in pull request #25304: 
[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API.
URL: https://github.com/apache/spark/pull/25304#discussion_r316652821
 
 

 ##########
 File path: 
core/src/main/java/org/apache/spark/shuffle/api/ShuffleExecutorComponents.java
 ##########
 @@ -39,17 +40,39 @@
   /**
    * Called once per map task to create a writer that will be responsible for 
persisting all the
    * partitioned bytes written by that map task.
-   *  @param shuffleId Unique identifier for the shuffle the map task is a 
part of
+   *
+   * @param shuffleId Unique identifier for the shuffle the map task is a part 
of
    * @param mapId Within the shuffle, the identifier of the map task
    * @param mapTaskAttemptId Identifier of the task attempt. Multiple attempts 
of the same map task
- *                         with the same (shuffleId, mapId) pair can be 
distinguished by the
- *                         different values of mapTaskAttemptId.
+   *                         with the same (shuffleId, mapId) pair can be 
distinguished by the
+   *                         different values of mapTaskAttemptId.
    * @param numPartitions The number of partitions that will be written by the 
map task. Some of
-*                      these partitions may be empty.
+   *                      these partitions may be empty.
    */
   ShuffleMapOutputWriter createMapOutputWriter(
       int shuffleId,
       int mapId,
       long mapTaskAttemptId,
       int numPartitions) throws IOException;
+
+  /**
+   * An optional extension for creating a map output writer that can optimize 
the transfer of a
+   * single partition file, as the entire result of a map task, to the backing 
store.
+   * <p>
+   * Most implementations should return the default {@link Optional#empty()} 
to indicate that
+   * they do not support this optimization. This primarily is for 
backwards-compatibility in
 
 Review comment:
   Truth be told, even plugins that support remote FS move would unlikely be 
able to support this well - one would still have to transfer the whole file up 
to the remote storage layer, but that could just as easily be done by writing 
the data from the file through an output stream.
   
   I think only implementations that stage the files locally could support this 
in any meaningful way at all.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to