cxzl25 opened a new pull request, #47733:
URL: https://github.com/apache/spark/pull/47733

   ### What changes were proposed in this pull request?
   This PR aims to support separate buffer size configuration in 
UnsafeShuffleWriter.
   
   Introduce `spark.shuffle.file.merge.buffer` configuration.
   
   ### Why are the changes needed?
   
   
`UnsafeShuffleWriter#mergeSpillsWithFileStream`使用`spark.shuffle.file.buffer`作为读取spill的文件的buffer,并且这个buffer是堆外buffer
   
   `UnsafeShuffleWriter#mergeSpillsWithFileStream` uses 
`spark.shuffle.file.buffer` as the buffer for reading spill files, and this 
buffer is an off-heap buffer.
   
   In the spill process, we hope that the buffer size is larger, but once there 
are too many files in the spill, 
`UnsafeShuffleWriter#mergeSpillsWithFileStream` needs to create a lot of 
off-heap memory, which makes the executor easily killed by YARN.
   
   
https://github.com/apache/spark/blob/e72d21c299a450e48b3cf6e5d36b8f3e9a568088/core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java#L372-L375
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Production environment verification
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to