mridulm commented on code in PR #45661:
URL: https://github.com/apache/spark/pull/45661#discussion_r1540475779


##########
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java:
##########
@@ -198,7 +201,8 @@ private void writeSortedFile(boolean isFinalFile) {
     // spark.shuffle.compress instead of spark.shuffle.spill.compress, so we 
need to use
     // createTempShuffleBlock here; see SPARK-3426 for more details.
     final Tuple2<TempShuffleBlockId, File> spilledFileInfo =
-      blockManager.diskBlockManager().createTempShuffleBlock();
+      
finalDataFileDir.map(blockManager.diskBlockManager()::createTempShuffleBlockInDir)
+        .orElseGet(blockManager.diskBlockManager()::createTempShuffleBlock);

Review Comment:
   ```suggestion
         finalDataFileDir.filter(v -> 
spills.isEmpty()).map(blockManager.diskBlockManager()::createTempShuffleBlockInDir)
           .orElseGet(blockManager.diskBlockManager()::createTempShuffleBlock);
   ```
   
   We need this only when there is a single output file, else we want to spread 
it.



##########
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java:
##########
@@ -198,7 +201,8 @@ private void writeSortedFile(boolean isFinalFile) {
     // spark.shuffle.compress instead of spark.shuffle.spill.compress, so we 
need to use
     // createTempShuffleBlock here; see SPARK-3426 for more details.
     final Tuple2<TempShuffleBlockId, File> spilledFileInfo =
-      blockManager.diskBlockManager().createTempShuffleBlock();

Review Comment:
   That would assume the final output has to go to 
`blockResolver.getDataFile(shuffleId, mapId)` right @cloud-fan  ?
   Currently at this layer we do not make that assumption ...
   
   I was initially toying with the idea of passing `mapId` and `shuffleId` as 
constructor params ... and do something similar when I realized this would make 
assumptions that the code currently does not make - and so why the base 
directory is being passed around.
   
   (And then ofcourse I thought we could solve it in 
`LocalDiskSingleSpillMapOutputWriter` 
[here](https://github.com/apache/spark/pull/45661#issuecomment-2020527368), but 
was completely wrong :-( ).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to