squito commented on a change in pull request #25304: 
[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API.
URL: https://github.com/apache/spark/pull/25304#discussion_r317758804
 
 

 ##########
 File path: 
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java
 ##########
 @@ -273,57 +259,93 @@ void forceSorterToSpill() throws IOException {
    *
    * @return the partition lengths in the merged file.
    */
-  private long[] mergeSpills(SpillInfo[] spills,
-      ShuffleMapOutputWriter mapWriter) throws IOException {
+  private long[] mergeSpills(SpillInfo[] spills) throws IOException {
+    long[] partitionLengths;
+    if (spills.length == 0) {
+      final ShuffleMapOutputWriter mapWriter = shuffleExecutorComponents
+          .createMapOutputWriter(
+              shuffleId,
+              mapId,
+              taskContext.taskAttemptId(),
+              partitioner.numPartitions());
+      mapWriter.commitAllPartitions();
+      return new long[partitioner.numPartitions()];
+    } else if (spills.length == 1) {
+      Optional<SingleFileShuffleMapOutputWriter> maybeSingleFileWriter =
+          shuffleExecutorComponents.createSingleFileMapOutputWriter(
+              shuffleId, mapId, taskContext.taskAttemptId());
+      if (maybeSingleFileWriter.isPresent()) {
+        // Here, we don't need to perform any metrics updates because the 
bytes written to this
+        // output file would have already been counted as shuffle bytes 
written.
 
 Review comment:
   I dunno if there is a great alternative.  We could say its the job of 
individual implementations to increment the metrics, and then move this comment 
into `LocalDiskSingleSpillMapOutputWriter` on why the metrics aren't 
incremented.  But we're specifically trying to avoid exposing metrics to the 
api.  you could also have `transferMapSpillFile()` return the number of bytes 
written, and then the existing implementation would return 0.
   
   It all kinda feels like overkill to me.  @gczsjdy I agree its possible for 
another store to take advantage of this, but do you have a specific case in 
mind?  I'd like to avoid adding too many things to the api and keep things 
simple (with odd cases just to support the existing implementation).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to