vinooganesh commented on pull request #28128: URL: https://github.com/apache/spark/pull/28128#issuecomment-624851677
Hey @cloud-fan - Sure, right now the listener issue is coupled with the operating model for `SparkSession`s (which is where I think the confusion is coming from). Currently, every time a `SparkSession` is created, a listener is attached to the `SparkContext` - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L947. Once the listener is created, because of the way it is created, the reference is lost after it is attached to the SparkContext. This means that there is currently no way to remove the listener from the SparkContext (even after the lifetime of the session is "done"). Even if I call `clearActiveSession()` or `clearDefaultSession()` the listener continues to live on the `SparkContext`, even after the sessions are GCed away. This create an issue for JVMs with multi-tenancy. Many SparkSessions may be spun up , but without a clean way to remove the listener, we leak it. This is the listener memory leak. I think my PR description may have been unclear- there isn't a leak of the `SparkSessions` instances themselves (at least that I'm aware of). The reason this PR does more than just removing the listeners, is because there isn't a lifecycle method that actually allows to mark the end of a `SparkSession` that doesn't kill the underlying `SparkContext` (`stop()` - which then kills all other sessions that rely on the `SparkContext` staying alive). The strangeness here is the interaction between `SparkSession`s, which should be lightweight and easy to clean up` with the longer-lived `SparkContext` which should exist for the duration of all alive `SparkSession`. So, in order to ever clean up the listener leak, we need a way to mark the session as over, and that currently doesn't exist. Does that make sense? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
