This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 7971e1c6a7c [SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher 7971e1c6a7c is described below commit 7971e1c6a7c074c65829c2bdfad857a33e0a7a5d Author: Xieming LI <risyo...@gmail.com> AuthorDate: Fri Jun 30 08:20:04 2023 -0500 [SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher ### What changes were proposed in this pull request? Using `FileSystem.closeAllForUGI` to close the cache to prevent memory leak. ### Why are the changes needed? There seems to be a memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher. For more detail, see [SPARK-41599](https://issues.apache.org/jira/browse/SPARK-41599) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I have tested the patch with my code which uses inProcessLauncher. Confirmed that the memory leak issue is mitigated. <img width="1059" alt="Screenshot 2023-06-23 at 11 46 52" src="https://github.com/apache/spark/assets/4378066/cfdef4d3-cb43-464c-bb46-de60f3b91622"> I will be very helpful if I can have some feedback and I will add some test cases if required. Closes #41692 from risyomei/fix-SPARK-41599. Authored-by: Xieming LI <risyo...@gmail.com> Signed-off-by: Sean Owen <sro...@gmail.com> --- core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 2 ++ .../apache/spark/deploy/security/HadoopDelegationTokenManager.scala | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index 8f9477385e7..60253ed5fda 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -186,6 +186,8 @@ private[spark] class SparkSubmit extends Logging { } else { throw e } + } finally { + FileSystem.closeAllForUGI(proxyUser) } } } else { diff --git a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala index 6ce195b6c7a..54a24927ded 100644 --- a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala @@ -26,6 +26,7 @@ import java.util.concurrent.{ScheduledExecutorService, TimeUnit} import scala.collection.mutable import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.security.{Credentials, UserGroupInformation} import org.apache.spark.SparkConf @@ -149,6 +150,9 @@ private[spark] class HadoopDelegationTokenManager( creds.addAll(newTokens) } }) + if(!currentUser.equals(freshUGI)) { + FileSystem.closeAllForUGI(freshUGI) + } } } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org