JoshRosen commented on PR #37282:
URL: https://github.com/apache/spark/pull/37282#issuecomment-1195799851

   > I have a question if one application need register many 
QueryExecutionListener, it seems cause performance fallback.
   
   Yes: if an application _doesn't_ use QueryExecutionListener then this PR 
will improve performance, but performance will be unchanged for applications 
where each SparkSession registers a QueryExecutionListener.
   
   To improve performance in that case, we can consider restructuring the code 
so that only one listener is registered for each SparkContext. Internally, that 
singleton listener could maintain a hashmap from sessionUUID -> 
session-specific-listener-bus. This would mean that QueryExecutionListener 
event dispatch would be O(1) instead of O(numSessions), which is a huge 
improvement.
   
   When implementing this idea, I'd like to avoid actual global singletons 
because they make testing harder. Instead, I'd probably have an `object 
ExecutionListenerManager { val registrationLock = new Object; def 
getOrRegisterListener(sc: SparkContext): ExecutionListenerManager }` where we 
have a global singleton lock which is used to register the per-SparkContext 
root listener manager.
   
   I'd be interested in implementing that larger change, but I'd like to do it 
in a separate PR because it's a lot larger in scope. This current PR's fix is 
surgical and improves performance in a real-world use-case, so I'd like to 
start by merging this small fix and tackle the larger redesign in a separate 
followup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to