wankunde commented on PR #37922: URL: https://github.com/apache/spark/pull/37922#issuecomment-1250635526
> We should decouple current implementation details when making protocol changes, and make it extensible for future evolution. > > In this case though, it is much more straightforward - there is an existing usecase which requires shuffle merge id. When retrying an indeterminate stage, we should cleanup merged shuffle data for previous stage attempt (in `submitMissingTasks`, before `unregisterAllMapAndMergeOutput`) - and given the potential race conditions there, we dont want `RemoveShuffleMerge` to clean up for the next attempt (when we add support for this). > > This specific change can be done in a follow up PR though - I want to get the basic mechanics working in this PR, and ensure the cleanup usecase is handled - before looking at further enhancements. Since the push-based shuffle service will auto clean up the old shuffle merge data, so we don't need send RemoveShuffleMerge RPC for a new ShuffleMerge? The only scenario I can think of now where a cleanup RPC needs is the spark job completes. Could we think of other scenarios? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
