vinooganesh commented on pull request #28128:
URL: https://github.com/apache/spark/pull/28128#issuecomment-624851677


   Hey @cloud-fan - Sure, right now the listener issue is coupled with the 
operating model for `SparkSession`s (which is where I think the confusion is 
coming from).
   
   Currently, every time a `SparkSession` is created, a listener is attached to 
the `SparkContext` - 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L947.
 
   
   Once the listener is created, because of the way it is created, the 
reference is lost after it is attached to the SparkContext. This means that 
there is currently no way to remove the listener from the SparkContext (even 
after the lifetime of the session is "done"). Even if I call 
`clearActiveSession()` or `clearDefaultSession()` the listener continues to 
live on the `SparkContext`, even after the sessions are GCed away. This create 
an issue for JVMs with multi-tenancy. Many SparkSessions may be spun up , but 
without a clean way to remove the listener, we leak it. This is the listener 
memory leak. 
   
   I think my PR description may have been unclear- there isn't a leak of the 
`SparkSessions` instances themselves (at least that I'm aware of).
   
   The reason this PR does more than just removing the listeners, is because 
there isn't a lifecycle method that actually allows to mark the end of a 
`SparkSession` that doesn't kill the underlying `SparkContext` (`stop()` - 
which then kills all other sessions that rely on the `SparkContext` staying 
alive). The strangeness here is the interaction between `SparkSession`s, which 
should be lightweight and easy to clean up` with the longer-lived 
`SparkContext` which should exist for the duration of all alive `SparkSession`. 
   
   So, in order to ever clean up the listener leak, we need a way to mark the 
session as over, and that currently doesn't exist. 
   
   Does that make sense? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to