Re: [PR] [SPARK-47518][CORE] Skip transfer the last spilled shuffle data [spark]

via GitHub Tue, 26 Mar 2024 17:49:26 -0700


wankunde commented on PR #45661:
URL: https://github.com/apache/spark/pull/45661#issuecomment-2021723677


   > `ExecutorDiskUtils.getFilePath` and `DiskBlockManager.getFile` determines 
the layout of the files ... and given this is specific to an optimization in 
`LocalDiskShuffleExecutorComponents` (which determines the output file), why 
not simply modify `LocalDiskSingleSpillMapOutputWriter.transferMapSpillFile` to 
leverage this layout information and 'host' the final output in the same disk ?
   > 
   > `KubernetesLocalDiskShuffleExecutorComponents.recoverDiskStore` for 
example, does something similar as well (using the layout info to recover - not 
this specific layout).
   > 
   > ```
   > 
   >   public void transferMapSpillFile(
   >       File mapSpillFile,
   >       long[] partitionLengths,
   >       long[] checksums) throws IOException {
   >     // The map spill file already has the proper format, and it contains 
all of the partition data.
   >     // So just transfer it directly to the destination without any merging.
   >     File parent = ExecutorDiskUtils.getLocalDir(mapSpillFile);
   >     File outputFile = blockResolver.getDataFile(shuffleId, mapId, 
Some(Array(parent.getAbsolutePath)));
   >     File tempFile = Utils.tempFileWith(outputFile);
   >     Files.move(mapSpillFile.toPath(), tempFile.toPath());
   >     blockResolver
   >       .writeMetadataFileAndCommit(shuffleId, mapId, partitionLengths, 
checksums, tempFile);
   >   }
   > ```
   > 
   > This is only going to marginally alleviate the issues though, the slow 
disks are the RC here - and their impact will manifest in a lot of other ways 
as well. (We built push based shuffle when we started seeing very high disk 
issues of this sort - though unfortunately it is only supported in yarn right 
now)
   
   Thanks @mridulm
   Do you mean to change the location of the final shuffle data file? I prefer 
keep the current shuffle files layout, the executors write shuffle files, and 
then the external shuffle service can read them without any additional work. 
   `KubernetesLocalDiskShuffleExecutorComponents.recoverDiskStore(sparkConf, 
blockManager)` will perform a heavy disk scan in only 
`KubernetesLocalDiskShuffleExecutorComponents.initializeExecutor()` method, the 
internal layout of shuffle data files is the same as in yarn.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47518][CORE] Skip transfer the last spilled shuffle data [spark]

Reply via email to