squito commented on a change in pull request #25304:
[SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API.
URL: https://github.com/apache/spark/pull/25304#discussion_r309442177
##########
File path:
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java
##########
@@ -281,25 +273,21 @@ void forceSorterToSpill() throws IOException {
*
* @return the partition lengths in the merged file.
*/
- private long[] mergeSpills(SpillInfo[] spills, File outputFile) throws
IOException {
+ private long[] mergeSpills(SpillInfo[] spills,
+ ShuffleMapOutputWriter mapWriter) throws IOException {
final boolean compressionEnabled = (boolean)
sparkConf.get(package$.MODULE$.SHUFFLE_COMPRESS());
final CompressionCodec compressionCodec =
CompressionCodec$.MODULE$.createCodec(sparkConf);
final boolean fastMergeEnabled =
(boolean)
sparkConf.get(package$.MODULE$.SHUFFLE_UNDAFE_FAST_MERGE_ENABLE());
final boolean fastMergeIsSupported = !compressionEnabled ||
CompressionCodec$.MODULE$.supportsConcatenationOfSerializedStreams(compressionCodec);
final boolean encryptionEnabled =
blockManager.serializerManager().encryptionEnabled();
+ final int numPartitions = partitioner.numPartitions();
+ long[] partitionLengths = new long[numPartitions];
try {
if (spills.length == 0) {
- new FileOutputStream(outputFile).close(); // Create an empty file
- return new long[partitioner.numPartitions()];
- } else if (spills.length == 1) {
- // Here, we don't need to perform any metrics updates because the
bytes written to this
- // output file would have already been counted as shuffle bytes
written.
- Files.move(spills[0].file, outputFile);
Review comment:
nothing very clean -- just adding some special new optional method just for
this, which will probably be ignored by every non-localdisk implementation
(like we do for channels). eg. something like
```scala
val singleSpillFileHandlerOption = mapWriter.getSingleSpillFileWriter
if (singleSipllFileHandlerOption.isDefined) {
singleSipllFileHandlerOption.get.writeAllPartitionsFromOneSpillFile(file)
}
```
(probably horrible names, maybe even wrong place to put that ...)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]