ivoson commented on PR #54136: URL: https://github.com/apache/spark/pull/54136#issuecomment-3850928592
> Is it possible to switch the initialization order between BlockManager initialization and ShuffleManager initialization? > > If impossible, I actually think we could add a new flag in BlockManagerInfo (or a pending list for those inital BlockManagers in terms of better memory utilization) to indicate whether the executor is ready for shuffle migration by sending a RPC signal (similar to `LaunchedExecutor`) after ShuffleManager initialized. > > TBH, the current fix looks a bit complex to me. Have thought about the other options: It's hard to reorder the initialization steps since we'll need to register blockManager to get the blockMangerId before executor heartbeat. And moving executor heartbeat after shuffle manager initialization may cause heartbeat timeout... For adding a new flag in `BlockManagerInfo` will introduce a new stage and new RPC protocol between driver and executor, try to avoid that since the issue only affect shuffle migration with some race condition which should be pretty rare. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
