Ngone51 commented on pull request #34536: URL: https://github.com/apache/spark/pull/34536#issuecomment-964738126
> SparkListenerExecutorAdded and SparkListenerExecutorRemoved are distinct from blockmanager events. Based on what I currently see, can you clarify why SparkListenerExecutorRemoved needs to be fired ? So, first of all, we should know that there's a case (reported by SPARK-35011) where the executor doesn't exist in the scheduler backend but exist in `BlockMangerMaster`(in the way of `BlockManager`). In this case, only a `SparkListenerBlockManagerAdded` event that is fired during `BlockManager` registration. And on the `AppStatusListener` side, whenever there's a `SparkListenerExecutorAdded` or `SparkListenerBlockManagerAdded`, it'd create a live executor entity for the executor. Therefore, we'd have a live executor in UI in the case of SPARK35011, even if the executor is dead indeed. For such registered `BlockManager`s, fortunately, we have `HeartbeatReceiver.expireDeadHosts` to remove them in the end, which fires a `SparkListenerBlockManagerRemoved` during removal. Note that, there won't be a `SparkListenerExecutorRemoved` fired since scheduler backend (`executorDataMap`) already doesn't contain the executor. However, for `AppStatusListener`, it only accepts `SparkListenerExecutorRemoved` to remove a live executor in UI but not `SparkListenerBlockManagerRemoved`. Therefore, we need to fire a separate `SparkListenerExecutorRemoved` for it. > If there is downstream use of blockmanager and executor events interchangably, we should fix that instead of duplicating event ? (I am assuming reference to AppStatusListener was for this ?) Yes, it's `AppStatusListener` that needs the event. If we fix in `AppStatusListener`, we'd miss the exact executor loss reason in UI (`SparkListenerExecutorRemoved` contains a loss reason field but `SparkListenerBlockManagerRemoved` doesn't). So I choose to duplicate the event instead of fixing in `AppStatusListener`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
