IIUC, your program will finally generate 100 ChildFirstClassLoader in a TM. But it should always be GC when job finished. So, as Arvid said, you'd better check who is referencing those ChildFirstClassLoader.
Best, Yangze Guo On Thu, Apr 8, 2021 at 5:43 PM 太平洋 <495635...@qq.com> wrote: > > My application program looks like this. Does this structure has some problem? > > public class StreamingJob { > public static void main(String[] args) throws Exception { > int i = 0; > while (i < 100) { > try { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setRuntimeMode(RuntimeExecutionMode.BATCH); > env.setParallelism(Parallelism); > > EnvironmentSettings bsSettings = > EnvironmentSettings.newInstance().useBlinkPlanner() > .inStreamingMode().build(); > StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, > bsSettings); > > bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx"); > Table t = bsTableEnv.sqlQuery(query); > > DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class); > > DataStream<StatisPoint> weightPoints = points.map(); > > DataStream<PredictPoint> predictPoints = weightPoints.keyBy() > .reduce().map(); > > // side output > final OutputTag<PredictPoint> outPutPredict = new > OutputTag<PredictPoint>("predict") { > }; > > SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints > .process(); > > DataStream<PredictPoint> exStream = > mainDataStream.getSideOutput(outPutPredict); > > //write data to clickhouse > String insertIntoCKSql = "xxx"; > mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(), > new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(), > new > JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName) > .withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build())); > > // write data to kafka > FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(); > exStream.map().addSink(producer); > > env.execute("Prediction Program"); > } catch (Exception e) { > e.printStackTrace(); > } > i++; > Thread.sleep(window * 1000); > } > } > } > > > > ------------------ 原始邮件 ------------------ > 发件人: "Arvid Heise" <ar...@apache.org>; > 发送时间: 2021年4月8日(星期四) 下午2:33 > 收件人: "Yangze Guo"<karma...@gmail.com>; > 抄送: > "太平洋"<495635...@qq.com>;"user"<user@flink.apache.org>;"guowei.mgw"<guowei....@gmail.com>;"renqschn"<renqs...@gmail.com>; > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem > > Hi, > > ChildFirstClassLoader are created (more or less) by application jar and > seeing so many looks like a classloader leak to me. I'd expect you to see a > new ChildFirstClassLoader popping up with each new job submission. > > Can you check who is referencing the ChildFirstClassLoader transitively? > Usually, it's some thread that is lingering around because some third party > library is leaking threads etc. > > OneInputStreamTask is legit and just indicates that you have a job running > with 4 slots on that TM. It should not hold any dedicated metaspace memory. > > On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <karma...@gmail.com> wrote: >> >> I went through the JM & TM logs but could not find any valuable clue. >> The exception is actually thrown by kafka-producer-network-thread. >> Maybe @Qingsheng could also take a look? >> >> >> Best, >> Yangze Guo >> >> On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <495635...@qq.com> wrote: >> > >> > I have configured to 512M, but problem still exist. Now the memory size is >> > still 256M. >> > Attachments are TM and JM logs. >> > >> > Look forward to your reply. >> > >> > ------------------ 原始邮件 ------------------ >> > 发件人: "Yangze Guo" <karma...@gmail.com>; >> > 发送时间: 2021年4月6日(星期二) 晚上6:35 >> > 收件人: "太平洋"<495635...@qq.com>; >> > 抄送: "user"<user@flink.apache.org>;"guowei.mgw"<guowei....@gmail.com>; >> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem >> > >> > > I have tried this method, but the problem still exist. >> > How much memory do you configure for it? >> > >> > > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal >> > Not quite sure about it. AFAIK, each job will have a classloader. >> > Multiple tasks of the same job in the same TM will share the same >> > classloader. The classloader will be removed if there is no more task >> > running on the TM. Classloader without reference will be finally >> > cleanup by GC. Could you share JM and TM logs for further analysis? >> > I'll also involve @Guowei Ma in this thread. >> > >> > >> > Best, >> > Yangze Guo >> > >> > On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <495635...@qq.com> wrote: >> > > >> > > I have tried this method, but the problem still exist. >> > > by heap dump analysis, is 21 instances of >> > > "org.apache.flink.util.ChildFirstClassLoader" normal? >> > > >> > > >> > > ------------------ 原始邮件 ------------------ >> > > 发件人: "Yangze Guo" <karma...@gmail.com>; >> > > 发送时间: 2021年4月6日(星期二) 下午4:32 >> > > 收件人: "太平洋"<495635...@qq.com>; >> > > 抄送: "user"<user@flink.apache.org>; >> > > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem >> > > >> > > I think you can try to increase the JVM metaspace option for >> > > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1] >> > > >> > > [1] >> > > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace >> > > >> > > Best, >> > > Yangze Guo >> > > >> > > Best, >> > > Yangze Guo >> > > >> > > >> > > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <495635...@qq.com> wrote: >> > > > >> > > > batch job: >> > > > read data from s3 by sql,then by some operators and write data to >> > > > clickhouse and kafka. >> > > > after some times, task-manager quit with OutOfMemoryError: Metaspace. >> > > > >> > > > env: >> > > > flink version:1.12.2 >> > > > task-manager slot count: 5 >> > > > deployment: standalone kubernetes session 模式 >> > > > dependencies: >> > > > >> > > > <dependency> >> > > > >> > > > <groupId>org.apache.flink</groupId> >> > > > >> > > > <artifactId>flink-connector-kafka_2.11</artifactId> >> > > > >> > > > <version>${flink.version}</version> >> > > > >> > > > </dependency> >> > > > >> > > > <dependency> >> > > > >> > > > <groupId>com.google.code.gson</groupId> >> > > > >> > > > <artifactId>gson</artifactId> >> > > > >> > > > <version>2.8.5</version> >> > > > >> > > > </dependency> >> > > > >> > > > <dependency> >> > > > >> > > > <groupId>org.apache.flink</groupId> >> > > > >> > > > <artifactId>flink-connector-jdbc_2.11</artifactId> >> > > > >> > > > <version>${flink.version}</version> >> > > > >> > > > </dependency> >> > > > >> > > > <dependency> >> > > > >> > > > <groupId>ru.yandex.clickhouse</groupId> >> > > > >> > > > <artifactId>clickhouse-jdbc</artifactId> >> > > > >> > > > <version>0.3.0</version> >> > > > >> > > > </dependency> >> > > > >> > > > <dependency> >> > > > >> > > > <groupId>org.apache.flink</groupId> >> > > > >> > > > <artifactId>flink-parquet_2.11</artifactId> >> > > > >> > > > <version>${flink.version}</version> >> > > > >> > > > </dependency> >> > > > >> > > > <dependency> >> > > > >> > > > <groupId>org.apache.flink</groupId> >> > > > >> > > > <artifactId>flink-json</artifactId> >> > > > >> > > > <version>${flink.version}</version> >> > > > >> > > > </dependency> >> > > > >> > > > >> > > > heap dump1: >> > > > >> > > > Leak Suspects >> > > > >> > > > System Overview >> > > > >> > > > Leaks >> > > > >> > > > Overview >> > > > >> > > > >> > > > Problem Suspect 1 >> > > > >> > > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded >> > > > by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 >> > > > (41.16%) bytes. >> > > > >> > > > Biggest instances: >> > > > >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 >> > > > (2.05%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 >> > > > (2.03%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 >> > > > (2.03%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 >> > > > (2.03%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 >> > > > (2.01%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 >> > > > (2.01%) bytes. >> > > > >> > > > >> > > > >> > > > Keywords >> > > > org.apache.flink.util.ChildFirstClassLoader >> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0 >> > > > Details » >> > > > >> > > > Problem Suspect 2 >> > > > >> > > > 34,407 instances of >> > > > "org.apache.flink.core.memory.HybridMemorySegment", loaded by >> > > > "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 >> > > > (10.70%) bytes. >> > > > >> > > > Keywords >> > > > org.apache.flink.core.memory.HybridMemorySegment >> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0 >> > > > >> > > > Details » >> > > > >> > > > >> > > > >> > > > heap dump2: >> > > > >> > > > Leak Suspects >> > > > >> > > > System Overview >> > > > >> > > > Leaks >> > > > >> > > > Overview >> > > > >> > > > Problem Suspect 1 >> > > > >> > > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded >> > > > by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 >> > > > (30.68%) bytes. >> > > > >> > > > Biggest instances: >> > > > >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 >> > > > (1.74%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 >> > > > (1.73%) bytes. >> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 >> > > > (1.72%) bytes. >> > > > >> > > > >> > > > >> > > > Keywords >> > > > org.apache.flink.util.ChildFirstClassLoader >> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0 >> > > > Details » >> > > > >> > > > Problem Suspect 2 >> > > > >> > > > 4 instances of >> > > > "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded >> > > > by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 >> > > > (13.71%) bytes. >> > > > >> > > > Biggest instances: >> > > > >> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ >> > > > 0x73e2d0cb0 - 4,364,536 (5.14%) bytes. >> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ >> > > > 0x73d62fb88 - 3,643,576 (4.29%) bytes. >> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ >> > > > 0x73dae0270 - 3,635,952 (4.28%) bytes. >> > > > >> > > > >> > > > >> > > > Keywords >> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0 >> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask >> > > > Details » >> > > > >> > > >