JoshRosen commented on PR #37282:
URL: https://github.com/apache/spark/pull/37282#issuecomment-1195799851
> I have a question if one application need register many
QueryExecutionListener, it seems cause performance fallback.
Yes: if an application _doesn't_ use QueryExecutionListener then this PR
will improve performance, but performance will be unchanged for applications
where each SparkSession registers a QueryExecutionListener.
To improve performance in that case, we can consider restructuring the code
so that only one listener is registered for each SparkContext. Internally, that
singleton listener could maintain a hashmap from sessionUUID ->
session-specific-listener-bus. This would mean that QueryExecutionListener
event dispatch would be O(1) instead of O(numSessions), which is a huge
improvement.
When implementing this idea, I'd like to avoid actual global singletons
because they make testing harder. Instead, I'd probably have an `object
ExecutionListenerManager { val registrationLock = new Object; def
getOrRegisterListener(sc: SparkContext): ExecutionListenerManager }` where we
have a global singleton lock which is used to register the per-SparkContext
root listener manager.
I'd be interested in implementing that larger change, but I'd like to do it
in a separate PR because it's a lot larger in scope. This current PR's fix is
surgical and improves performance in a real-world use-case, so I'd like to
start by merging this small fix and tackle the larger redesign in a separate
followup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]