[ https://issues.apache.org/jira/browse/FLINK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvid Heise reassigned FLINK-25023: ----------------------------------- Assignee: David Morávek > ClassLoader leak on JM/TM through indirectly-started Hadoop threads out of > user code > ------------------------------------------------------------------------------------ > > Key: FLINK-25023 > URL: https://issues.apache.org/jira/browse/FLINK-25023 > Project: Flink > Issue Type: Bug > Components: Connectors / FileSystem, Connectors / Hadoop > Compatibility, FileSystems > Affects Versions: 1.14.0, 1.12.5, 1.13.3 > Reporter: Nico Kruber > Assignee: David Morávek > Priority: Major > Labels: pull-request-available > > If a Flink job is using HDFS through Flink's filesystem abstraction (either > on the JM or TM), that code may actually spawn a few threads, e.g. from > static class members: > * > {{org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner}} > * {{IPC Parameter Sending Thread#*}} > These threads are started as soon as the classes are loaded which may be in > the context of the user code. In this specific scenario, however, the created > threads may contain references to the context class loader (I did not see > that though) or, as happened here, it may inherit thread contexts such as the > {{ProtectionDomain}} (from an {{{}AccessController{}}}). > Hence user contexts and user class loaders are leaked into long-running > threads that are run in Flink's (parent) classloader. > Fortunately, it seems to only *leak a single* {{ChildFirstClassLoader}} in > this concrete example but that may depend on which code paths each client > execution is walking. > > A *proper solution* doesn't seem so simple: > * We could try to proactively initialize available file systems in the hope > to start all threads in the parent classloader with parent context. > * We could create a default {{ProtectionDomain}} for spawned threads as > discussed at [https://dzone.com/articles/javalangoutofmemory-permgen], > however, the {{StatisticsDataReferenceCleaner}} isn't actually actively > spawned from any callback but as a static variable and this with the class > loading itself (but maybe this is still possible somehow). -- This message was sent by Atlassian Jira (v8.20.1#820001)