I have tried to add 'classloader.parent-first-patterns.additional:
"ru.yandex.clickhouse" ' to flink-config, but problem still exist.
Is there lightweight way to put clickhouse JDBC driver on Flink lib/ folder?
------------------ ???????? ------------------
??????:
"Maciek Pr??chniak"
<[email protected]>;
????????: 2021??4??9??(??????) ????3:24
??????: "??????"<[email protected]>;"Arvid
Heise"<[email protected]>;"Yangze Guo"<[email protected]>;
????: "user"<[email protected]>;"guowei.mgw"<[email protected]>;"renqschn"<[email protected]>;
????: Re: ?????? period batch job lead to OutOfMemoryError: Metaspace
problem
Hi,
Did you put the clickhouse JDBC driver on Flink main classpath (in lib
folder) and not in user-jar - as described here:
https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code?
When we encountered Metaspace leaks recently, in quite a few cases it
turned out that the problem was the JDBC driver in user classloder which
was registered by DriverManager and caused classloader leak.
maciek
On 08.04.2021 11:42, ?????? wrote:
My application program looks like this. Does this
structure has some problem?
public class StreamingJob {
public static void main(String[] args) throws
Exception {
int i = 0;
while (i < 100) {
try {
StreamExecutionEnvironment
env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.setParallelism(Parallelism);
EnvironmentSettings
bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamTableEnvironment
bsTableEnv = StreamTableEnvironment.create(env, bsSettings);
bsTableEnv.executeSql("CREATE
TEMPORARY TABLE xxxx");
Table t =
bsTableEnv.sqlQuery(query);
DataStream<DataPoint>
points = bsTableEnv.toAppendStream(t, DataPoint.class);
DataStream<StatisPoint>
weightPoints = points.map();
DataStream<PredictPoint>
predictPoints = weightPoints.keyBy()
.reduce().map();
// side output
final
OutputTag<PredictPoint> outPutPredict = new
OutputTag<PredictPoint>("predict") {
};
SingleOutputStreamOperator<PredictPoint> mainDataStream =
predictPoints
.process();
DataStream<PredictPoint>
exStream = mainDataStream.getSideOutput(outPutPredict);
//write
data to clickhouse
String
insertIntoCKSql = "xxx";
mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new
CkSinkBuilder(),
new
JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
new
JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
.withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));
// write data to kafka
FlinkKafkaProducer<String>
producer = new FlinkKafkaProducer<>();
exStream.map().addSink(producer);
env.execute("Prediction
Program");
} catch (Exception e) {
e.printStackTrace();
}
i++;
Thread.sleep(window * 1000);
}
}
}
------------------ ???????? ------------------
??????: "Arvid Heise" <[email protected]>;
????????: 2021??4??8??(??????) ????2:33
??????: "Yangze Guo"<[email protected]>;
????: "??????"<[email protected]>;"user"<[email protected]>;"guowei.mgw"<[email protected]>;"renqschn"<[email protected]>;
????: Re: period batch job lead to OutOfMemoryError:
Metaspace problem
Hi,
ChildFirstClassLoader are created (more or less) by
application jar and seeing so many looks like a classloader leak to
me. I'd expect you to see a new ChildFirstClassLoader popping up
with each new job submission.
Can you check who is referencing the
ChildFirstClassLoader transitively? Usually, it's some thread that
is lingering around because some third party library is leaking
threads etc.
OneInputStreamTask is legit and just indicates that you
have a job running with 4 slots on that TM. It should not hold any
dedicated metaspace memory.
On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo
<[email protected]> wrote:
I went through the JM & TM logs but could not find
any valuable clue.
The exception is actually thrown by
kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?
Best,
Yangze Guo
On Thu, Apr 8, 2021 at 10:39 AM ?????? <[email protected]> wrote:
>
> I have configured to 512M, but problem still exist. Now
the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ ???????? ------------------
> ??????: "Yangze Guo" <[email protected]>;
> ????????: 2021??4??6??(??????) ????6:35
> ??????: "??????"<[email protected]>;
> ????:
"user"<[email protected]>;"guowei.mgw"<[email protected]>;
> ????: Re: period batch job lead to OutOfMemoryError:
Metaspace problem
>
> > I have tried this method, but the problem still
exist.
> How much memory do you configure for it?
>
> > is 21 instances of
"org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a
classloader.
> Multiple tasks of the same job in the same TM will
share the same
> classloader. The classloader will be removed if there
is no more task
> running on the TM. Classloader without reference will
be finally
> cleanup by GC. Could you share JM and TM logs for
further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM ?????? <[email protected]>
wrote:
> >
> > I have tried this method, but the problem still
exist.
> > by heap dump analysis, is 21 instances of
"org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ ???????? ------------------
> > ??????: "Yangze Guo" <[email protected]>;
> > ????????: 2021??4??6??(??????) ????4:32
> > ??????: "??????"<[email protected]>;
> > ????: "user"<[email protected]>;
> > ????: Re: period batch job lead to OutOfMemoryError:
Metaspace problem
> >
> > I think you can try to increase the JVM metaspace
option for
> > TaskManagers through
taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM ??????
<[email protected]> wrote:
> > >
> > > batch job??
> > > read data from s3 by sql??then by some
operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with
OutOfMemoryError: Metaspace.
> > >
> > > env??
> > > flink version??1.12.2
> > > task-manager slot count: 5
> > > deployment?? standalone kubernetes session ????
> > > dependencies??
> > >
> > > <dependency>
> > >
> > >
<groupId>org.apache.flink</groupId>
> > >
> > >
<artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >
<version>${flink.version}</version>
> > >
> > > </dependency>
> > >
> > > <dependency>
> > >
> > >
<groupId>com.google.code.gson</groupId>
> > >
> > >
<artifactId>gson</artifactId>
> > >
> > >
<version>2.8.5</version>
> > >
> > > </dependency>
> > >
> > > <dependency>
> > >
> > >
<groupId>org.apache.flink</groupId>
> > >
> > >
<artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >
<version>${flink.version}</version>
> > >
> > > </dependency>
> > >
> > > <dependency>
> > >
> > >
<groupId>ru.yandex.clickhouse</groupId>
> > >
> > >
<artifactId>clickhouse-jdbc</artifactId>
> > >
> > >
<version>0.3.0</version>
> > >
> > > </dependency>
> > >
> > > <dependency>
> > >
> > >
<groupId>org.apache.flink</groupId>
> > >
> > >
<artifactId>flink-parquet_2.11</artifactId>
> > >
> > >
<version>${flink.version}</version>
> > >
> > > </dependency>
> > >
> > > <dependency>
> > >
> > >
<groupId>org.apache.flink</groupId>
> > >
> > >
<artifactId>flink-json</artifactId>
> > >
> > >
<version>${flink.version}</version>
> > >
> > > </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > > Leaks
> > >
> > > Overview
> > >
> > >
> > > Problem Suspect 1
> > >
> > > 21 instances of
"org.apache.flink.util.ChildFirstClassLoader", loaded by
"sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880
(41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @
0x73b2d42e0
> > > Details ?0?3
> > >
> > > Problem Suspect 2
> > >
> > > 34,407 instances of
"org.apache.flink.core.memory.HybridMemorySegment", loaded by
"sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168
(10.70%) bytes.
> > >
> > > Keywords
> > >
org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @
0x73b2d42e0
> > >
> > > Details ?0?3
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > > Leaks
> > >
> > > Overview
> > >
> > > Problem Suspect 1
> > >
> > > 21 instances of
"org.apache.flink.util.ChildFirstClassLoader", loaded by
"sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408
(30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @
0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @
0x73b2d42e0
> > > Details ?0?3
> > >
> > > Problem Suspect 2
> > >
> > > 4 instances of
"org.apache.flink.streaming.runtime.tasks.OneInputStreamTask",
loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy
11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > >
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @
0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > >
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @
0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > >
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @
0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @
0x73b2d42e0
> > >
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details ?0?3
> > >
> > >