Ngone51 commented on pull request #32385: URL: https://github.com/apache/spark/pull/32385#issuecomment-843722683
@tgravescs Thanks for the good points! I did find some perf regression by benchmarking with the change. I'll double-check it for sure and try to get rid of it if possible. > Also at the high level how does this affect other shuffle work going on - like merging and pluggable? Is it independent of that or would need to be implemented? For merging, it needs extension to send checksum values along with the block data while merging. The extension is also needed for the decommission feature. For pluggable, my current implementation is added at `LocalDiskShuffleMapOutputWriter`, which is supposed to be the default shuffle writer plugin for Spark. It means, in this way, other custom plugins needs its own implementation for checksum support. I adopted that way becase I realized it's easier and more clear to implement at that time. An alternative way to support checksum for all plugins or say to make it a built-in feature maybe is to implement it in `DiskBlockObjectWriter`/`ShufflePartitionPairsWriter`, which is the upstream to the shuffle I/O plugin. I need more investigation on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
