gaoyajun02 opened a new pull request, #46934: URL: https://github.com/apache/spark/pull/46934
### What changes were proposed in this pull request? Add consistency check for mapIds between the push merged block meta from the server side and the partition level bitmap on the driver side for reduce tasks. If any mapIds are found missing, fallback to fetching original shuffle blocks. This end-to-end check helps to avoid issues of data loss during the shuffle read phase when reduce tasks fetch merged data. ### Why are the changes needed? ShuffleBlockFetcherIterator initializes requests based on the merge status and map status from the driver side, where the merge status's partition level bitmap (mapIds) comes from the mapTracker maintained in the shuffle service's memory. But the actual mapIds for fetching chunk data come from the shuffle service's metaFile. There is no consistency check between the two. When the server encounters issues such as disk failures, it may lead to inconsistencies in mapIds between the mapTracker and the metaFile. This ultimately results in data loss when reduce tasks fetch merged data. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
