The setting itself isn't taskmanager specific; it applies to both the
job- and taskmanager process.
On 02/05/2022 05:29, John Smith wrote:
Also just to be sure this is a Task Manager setting right?
On Thu, Apr 28, 2022 at 11:13 AM John Smith <java.dev....@gmail.com>
wrote:
I assume you will take action on your side to track and fix the
doc? :)
On Thu, Apr 28, 2022 at 11:12 AM John Smith
<java.dev....@gmail.com> wrote:
Ok so to summarize...
- Build my job jar and have the JDBC driver as a compile only
dependency and copy the JDBC driver to flink lib folder.
Or
- Build my job jar and include JDBC driver in the shadow, plus
copy the JDBC driver in the flink lib folder, plus make an
entry in config for
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
On Thu, Apr 28, 2022 at 10:17 AM Chesnay Schepler
<ches...@apache.org> wrote:
I think what I meant was "either add it to /lib, or [if it
is already in /lib but also bundled in the jar] add it to
the parent-first patterns."
On 28/04/2022 15:56, Chesnay Schepler wrote:
Pretty sure, even though I seemingly documented it
incorrectly :)
On 28/04/2022 15:49, John Smith wrote:
You sure?
*
/JDBC/: JDBC drivers leak references outside the
user code classloader. To ensure that these classes
are only loaded once you should either add the
driver jars to Flink’s |lib/| folder, or add the
driver classes to the list of parent-first loaded
class via
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>.
It says either or
On Wed, Apr 27, 2022 at 3:44 AM Chesnay Schepler
<ches...@apache.org> wrote:
You're misinterpreting the docs.
The parent/child-first classloading controls where
Flink looks for a class /first/, specifically
whether we first load from /lib or the user-jar.
It does not allow you to load something from the
user-jar in the parent classloader. That's just not
how it works.
It must be in /lib.
On 27/04/2022 04:59, John Smith wrote:
Hi Chesnay as per the docs...
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/
You can either put the jars in task manager lib
folder or use
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
I prefer the latter like this: the dependency stays
with the user-jar and not on the task manager.
On Tue, Apr 26, 2022 at 9:52 PM John Smith
<java.dev....@gmail.com> wrote:
Ok so I should put the Apache ignite and my
Microsoft drivers in the lib folders of my task
managers?
And then in my job jar only include them as
compile time dependencies?
On Tue, Apr 26, 2022 at 10:42 AM Chesnay
Schepler <ches...@apache.org> wrote:
JDBC drivers are well-known for leaking
classloaders unfortunately.
You have correctly identified your
alternatives.
You must put the jdbc driver into /lib
instead. Setting only the parent-first
pattern shouldn't affect anything.
That is only relevant if something is in
both in /lib and the user-jar, telling
Flink to prioritize what is in lib.
On 26/04/2022 15:35, John Smith wrote:
So I
put classloader.parent-first-patterns.additional:
"org.apache.ignite." in the task config
and so far I don't think I'm getting
"java.lang.OutOfMemoryError: Metaspace"
any more.
Or it's too early to tell.
Though now, the task managers are shutting
down due to some other failures.
So maybe because tasks were failing and
reloading often the task manager was
running out of Metspace. But now maybe
it's just cleanly shutting down.
On Wed, Apr 20, 2022 at 11:35 AM John
Smith <java.dev....@gmail.com> wrote:
Or I can put in the config to treat
org.apache.ignite. classes as first class?
On Tue, Apr 19, 2022 at 10:18 PM John
Smith <java.dev....@gmail.com> wrote:
Ok, so I loaded the dump into
Eclipse Mat and followed:
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
- On the Histogram, I got over 30
entries for: ChildFirstClassLoader
- Then I clicked on one of them
"Merge Shortest Path..." and
picked "Exclude all
phantom/weak/soft references"
- Which then gave me:
SqlDriverManager > Apache Ignite
JdbcThin Driver
So i'm guessing anything JDBC
based. I should copy into the task
manager libs folder and my jobs
make the dependencies as compile only?
On Tue, Apr 19, 2022 at 12:18 PM
Yaroslav Tkachenko
<yaros...@goldsky.io> wrote:
Also
https://shopify.engineering/optimizing-apache-flink-applications-tips
might be helpful (has a
section on profiling, as well
as classloading).
On Tue, Apr 19, 2022 at 4:35
AM Chesnay Schepler
<ches...@apache.org> wrote:
We have a very rough
"guide" in the wiki (it's
just the specific steps I
took to debug another leak):
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
On 19/04/2022 12:01,
huweihua wrote:
Hi, John
Sorry for the late reply.
You can use MAT[1] to
analyze the dump file.
Check whether have too
many loaded classes.
[1]
https://www.eclipse.org/mat/
2022年4月18日 下午9:55,John
Smith
<java.dev....@gmail.com>
写道:
Hi, can anyone help with
this? I never looked at
a dump file before.
On Thu, Apr 14, 2022 at
11:59 AM John Smith
<java.dev....@gmail.com>
wrote:
Hi, so I have a dump
file. What do I look
for?
On Thu, Mar 31, 2022
at 3:28 PM John
Smith
<java.dev....@gmail.com>
wrote:
Ok so if there's
a leak, if I
manually stop
the job and
restart it from
the UI multiple
times, I won't
see the issue
because because
the classes are
unloaded correctly?
On Thu, Mar 31,
2022 at 9:20 AM
huweihua
<huweihua....@gmail.com>
wrote:
The
difference
is that
manually
canceling
the job
stops the
JobMaster,
but
automatic
failover
keeps the
JobMaster
running. But
looking on
TaskManager,
it doesn't
make much
difference
2022年3月31日
上午4:01,John
Smith
<java.dev....@gmail.com>
写道:
Also if I
manually
cancel and
restart the
same job
over and
over is it
the same as
if flink
was
restarting
a job due
to failure?
I.e: When I
click
"Cancel
Job" on the
UI is the
job
completely
unloaded vs
when the
job
scheduler
restarts a
job because
if whatever
reason?
Lile this
I'll stop
and restart
the job a
few times
or maybe I
can trick
my job to
fail and
have the
scheduler
restart it.
Ok let me
think about
this...
On Wed, Mar
30, 2022 at
10:24 AM
胡伟华
<huweihua....@gmail.com>
wrote:
So if
I run
the
same
jobs
in my
dev
env
will I
still
be
able
to see
the
similar
dump?
I think
running
the
same
job in
dev
should
be
reproducible,
maybe
you can
have a try.
If
not I
would
have
to
wait
at a
low
volume
time
to do
it on
production.
Aldo
if I
recall
the
dump
is as
big as
the
JVM
memory
right
so if
I have
10GB
configed
for
the
JVM
the
dump
will
be
10GB file?
Yes,
JMAP
will
pause
the
JVM,
the
time of
pause
depends
on the
size to
dump.
you can
use
"jmap
-dump:live"
to dump
only
the
reachable
objects,
this
will
take a
brief pause
2022年3月30日
下午9:47,John
Smith
<java.dev....@gmail.com>
写道:
I have
3 task
managers
(see
config
below).
There
is
total
of 10
jobs
with
25
slots
being
used.
The
jobs
are
100%
ETL
I.e;
They
load
Json,
transform
it and
push
it to
JDBC,
only 1
job of
the 10
is
pushing
to
Apache
Ignite
cluster.
FOR
JMAP.
I know
that
it
will
pause
the
task
manager.
So if
I run
the
same
jobs
in my
dev
env
will I
still
be
able
to see
the
similar
dump?
I I
assume
so. If
not I
would
have
to
wait
at a
low
volume
time
to do
it on
production.
Aldo
if I
recall
the
dump
is as
big as
the
JVM
memory
right
so if
I have
10GB
configed
for
the
JVM
the
dump
will
be
10GB file?
#
Operating
system
has
16GB
total.
env.ssh.opts:
-l
flink
-oStrictHostKeyChecking=no
cluster.evenly-spread-out-slots:
true
taskmanager.memory.flink.size:
10240m
taskmanager.memory.jvm-metaspace.size:
2048m
taskmanager.numberOfTaskSlots:
16
parallelism.default:
1
high-availability:
zookeeper
high-availability.storageDir:
file:///mnt/flink/ha/flink_1_14/
high-availability.zookeeper.quorum:
...
high-availability.zookeeper.path.root:
/flink_1_14
high-availability.cluster-id:
/flink_1_14_cluster_0001
web.upload.dir:
/mnt/flink/uploads/flink_1_14
state.backend:
rocksdb
state.backend.incremental:
true
state.checkpoints.dir:
file:///mnt/flink/checkpoints/flink_1_14
state.savepoints.dir:
file:///mnt/flink/savepoints/flink_1_14
On
Wed,
Mar
30,
2022
at
2:16
AM 胡伟华
<huweihua....@gmail.com>
wrote:
Hi,
John
Could
you
tell
us
you
application
scenario?
Is
it
a
flink
session
cluster
with
a
lot
of
jobs?
Maybe
you
can
try
to
dump
the
memory
with
jmap
and
use
tools
such
as
MAT
to
analyze
whether
there
are
abnormal
classes
and
classloaders
>
2022年3月30日
上午6:09,John
Smith
<java.dev....@gmail.com>
写道:
>
>
Hi
running
1.14.4
>
>
My
tasks
manager
still
fails
with
java.lang.OutOfMemoryError:
Metaspace.
The
metaspace
out-of-memory
error
has
occurred.
This
can
mean
two
things:
either
the
job
requires
a
larger
size
of
JVM
metaspace
to
load
classes
or
there
is
a
class
loading
leak.
>
>
I
have
2GB
of
metaspace
configed
taskmanager.memory.jvm-metaspace.size:
2048m
>
>
But
the
task
nodes
still
fail.
>
>
When
looking
at
the
UI
metrics,
the
metaspace
starts
low.
Now
I
see
85%
usage.
It
seems
to
be
a
class
loading
leak
at
this
point,
how
can
we
debug
this
issue?