mridulm commented on code in PR #45661:
URL: https://github.com/apache/spark/pull/45661#discussion_r1540475779
##########
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java:
##########
@@ -198,7 +201,8 @@ private void writeSortedFile(boolean isFinalFile) {
// spark.shuffle.compress instead of spark.shuffle.spill.compress, so we
need to use
// createTempShuffleBlock here; see SPARK-3426 for more details.
final Tuple2<TempShuffleBlockId, File> spilledFileInfo =
- blockManager.diskBlockManager().createTempShuffleBlock();
+
finalDataFileDir.map(blockManager.diskBlockManager()::createTempShuffleBlockInDir)
+ .orElseGet(blockManager.diskBlockManager()::createTempShuffleBlock);
Review Comment:
```suggestion
finalDataFileDir.filter(v ->
spills.isEmpty()).map(blockManager.diskBlockManager()::createTempShuffleBlockInDir)
.orElseGet(blockManager.diskBlockManager()::createTempShuffleBlock);
```
We need this only when there is a single output file, else we want to spread
it.
##########
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java:
##########
@@ -198,7 +201,8 @@ private void writeSortedFile(boolean isFinalFile) {
// spark.shuffle.compress instead of spark.shuffle.spill.compress, so we
need to use
// createTempShuffleBlock here; see SPARK-3426 for more details.
final Tuple2<TempShuffleBlockId, File> spilledFileInfo =
- blockManager.diskBlockManager().createTempShuffleBlock();
Review Comment:
That would assume the final output has to go to
`blockResolver.getDataFile(shuffleId, mapId)` right @cloud-fan ?
Currently at this layer we do not make that assumption ...
I was initially toying with the idea of passing `mapId` and `shuffleId` as
constructor params ... and do something similar when I realized this would make
assumptions that the code currently does not make - and so why the base
directory is being passed around.
(And then ofcourse I thought we could solve it in
`LocalDiskSingleSpillMapOutputWriter`
[here](https://github.com/apache/spark/pull/45661#issuecomment-2020527368), but
was completely wrong :-( ).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]