IIUC, your program will finally generate 100 ChildFirstClassLoader in
a TM. But it should always be GC when job finished. So, as Arvid said,
you'd better check who is referencing those ChildFirstClassLoader.


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 5:43 PM 太平洋 <495635...@qq.com> wrote:
>
> My application program looks like this. Does this structure has some problem?
>
> public class StreamingJob {
> public static void main(String[] args) throws Exception {
> int i = 0;
> while (i < 100) {
> try {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setRuntimeMode(RuntimeExecutionMode.BATCH);
> env.setParallelism(Parallelism);
>
> EnvironmentSettings bsSettings = 
> EnvironmentSettings.newInstance().useBlinkPlanner()
> .inStreamingMode().build();
> StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, 
> bsSettings);
>
> bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
> Table t = bsTableEnv.sqlQuery(query);
>
> DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);
>
> DataStream<StatisPoint> weightPoints = points.map();
>
> DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
> .reduce().map();
>
> // side output
> final OutputTag<PredictPoint> outPutPredict = new 
> OutputTag<PredictPoint>("predict") {
> };
>
> SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
> .process();
>
> DataStream<PredictPoint> exStream = 
> mainDataStream.getSideOutput(outPutPredict);
>
>                                         //write data to clickhouse
> String insertIntoCKSql = "xxx";
> mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
> new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
> new 
> JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
> .withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));
>
> // write data to kafka
> FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>();
> exStream.map().addSink(producer);
>
> env.execute("Prediction Program");
> } catch (Exception e) {
> e.printStackTrace();
> }
> i++;
> Thread.sleep(window * 1000);
> }
> }
> }
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Arvid Heise" <ar...@apache.org>;
> 发送时间: 2021年4月8日(星期四) 下午2:33
> 收件人: "Yangze Guo"<karma...@gmail.com>;
> 抄送: 
> "太平洋"<495635...@qq.com>;"user"<user@flink.apache.org>;"guowei.mgw"<guowei....@gmail.com>;"renqschn"<renqs...@gmail.com>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> Hi,
>
> ChildFirstClassLoader are created (more or less) by application jar and 
> seeing so many looks like a classloader leak to me. I'd expect you to see a 
> new ChildFirstClassLoader popping up with each new job submission.
>
> Can you check who is referencing the ChildFirstClassLoader transitively? 
> Usually, it's some thread that is lingering around because some third party 
> library is leaking threads etc.
>
> OneInputStreamTask is legit and just indicates that you have a job running 
> with 4 slots on that TM. It should not hold any dedicated metaspace memory.
>
> On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <karma...@gmail.com> wrote:
>>
>> I went through the JM & TM logs but could not find any valuable clue.
>> The exception is actually thrown by kafka-producer-network-thread.
>> Maybe @Qingsheng could also take a look?
>>
>>
>> Best,
>> Yangze Guo
>>
>> On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <495635...@qq.com> wrote:
>> >
>> > I have configured to 512M, but problem still exist. Now the memory size is 
>> > still 256M.
>> > Attachments are TM and JM logs.
>> >
>> > Look forward to your reply.
>> >
>> > ------------------ 原始邮件 ------------------
>> > 发件人: "Yangze Guo" <karma...@gmail.com>;
>> > 发送时间: 2021年4月6日(星期二) 晚上6:35
>> > 收件人: "太平洋"<495635...@qq.com>;
>> > 抄送: "user"<user@flink.apache.org>;"guowei.mgw"<guowei....@gmail.com>;
>> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>> >
>> > > I have tried this method, but the problem still exist.
>> > How much memory do you configure for it?
>> >
>> > > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
>> > Not quite sure about it. AFAIK, each job will have a classloader.
>> > Multiple tasks of the same job in the same TM will share the same
>> > classloader. The classloader will be removed if there is no more task
>> > running on the TM. Classloader without reference will be finally
>> > cleanup by GC. Could you share JM and TM logs for further analysis?
>> > I'll also involve @Guowei Ma in this thread.
>> >
>> >
>> > Best,
>> > Yangze Guo
>> >
>> > On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <495635...@qq.com> wrote:
>> > >
>> > > I have tried this method, but the problem still exist.
>> > > by heap dump analysis, is 21 instances of 
>> > > "org.apache.flink.util.ChildFirstClassLoader" normal?
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "Yangze Guo" <karma...@gmail.com>;
>> > > 发送时间: 2021年4月6日(星期二) 下午4:32
>> > > 收件人: "太平洋"<495635...@qq.com>;
>> > > 抄送: "user"<user@flink.apache.org>;
>> > > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>> > >
>> > > I think you can try to increase the JVM metaspace option for
>> > > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
>> > >
>> > > [1] 
>> > > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > >
>> > > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <495635...@qq.com> wrote:
>> > > >
>> > > > batch job:
>> > > > read data from s3 by sql,then by some operators and write data to 
>> > > > clickhouse and kafka.
>> > > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
>> > > >
>> > > > env:
>> > > > flink version:1.12.2
>> > > > task-manager slot count: 5
>> > > > deployment: standalone kubernetes session 模式
>> > > > dependencies:
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>org.apache.flink</groupId>
>> > > >
>> > > >       <artifactId>flink-connector-kafka_2.11</artifactId>
>> > > >
>> > > >       <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>com.google.code.gson</groupId>
>> > > >
>> > > >       <artifactId>gson</artifactId>
>> > > >
>> > > >       <version>2.8.5</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>org.apache.flink</groupId>
>> > > >
>> > > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
>> > > >
>> > > >       <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>ru.yandex.clickhouse</groupId>
>> > > >
>> > > >       <artifactId>clickhouse-jdbc</artifactId>
>> > > >
>> > > >       <version>0.3.0</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>org.apache.flink</groupId>
>> > > >
>> > > >         <artifactId>flink-parquet_2.11</artifactId>
>> > > >
>> > > >         <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >          <groupId>org.apache.flink</groupId>
>> > > >
>> > > >          <artifactId>flink-json</artifactId>
>> > > >
>> > > >          <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >
>> > > > heap dump1:
>> > > >
>> > > > Leak Suspects
>> > > >
>> > > > System Overview
>> > > >
>> > > >  Leaks
>> > > >
>> > > >  Overview
>> > > >
>> > > >
>> > > >   Problem Suspect 1
>> > > >
>> > > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded 
>> > > > by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 
>> > > > (41.16%) bytes.
>> > > >
>> > > > Biggest instances:
>> > > >
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 
>> > > > (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 
>> > > > (2.03%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 
>> > > > (2.03%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 
>> > > > (2.03%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 
>> > > > (2.01%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 
>> > > > (2.01%) bytes.
>> > > >
>> > > >
>> > > >
>> > > > Keywords
>> > > > org.apache.flink.util.ChildFirstClassLoader
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > > Details »
>> > > >
>> > > >   Problem Suspect 2
>> > > >
>> > > > 34,407 instances of 
>> > > > "org.apache.flink.core.memory.HybridMemorySegment", loaded by 
>> > > > "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 
>> > > > (10.70%) bytes.
>> > > >
>> > > > Keywords
>> > > > org.apache.flink.core.memory.HybridMemorySegment
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > >
>> > > > Details »
>> > > >
>> > > >
>> > > >
>> > > > heap dump2:
>> > > >
>> > > > Leak Suspects
>> > > >
>> > > > System Overview
>> > > >
>> > > >  Leaks
>> > > >
>> > > >  Overview
>> > > >
>> > > >   Problem Suspect 1
>> > > >
>> > > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded 
>> > > > by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 
>> > > > (30.68%) bytes.
>> > > >
>> > > > Biggest instances:
>> > > >
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 
>> > > > (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 
>> > > > (1.73%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 
>> > > > (1.72%) bytes.
>> > > >
>> > > >
>> > > >
>> > > > Keywords
>> > > > org.apache.flink.util.ChildFirstClassLoader
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > > Details »
>> > > >
>> > > >   Problem Suspect 2
>> > > >
>> > > > 4 instances of 
>> > > > "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded 
>> > > > by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 
>> > > > (13.71%) bytes.
>> > > >
>> > > > Biggest instances:
>> > > >
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 
>> > > > 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 
>> > > > 0x73d62fb88 - 3,643,576 (4.29%) bytes.
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 
>> > > > 0x73dae0270 - 3,635,952 (4.28%) bytes.
>> > > >
>> > > >
>> > > >
>> > > > Keywords
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
>> > > > Details »
>> > > >
>> > > >

Reply via email to