There are cases where user-code is run on the JobManager.
I'm not sure whether though that applies to the JDBC sources.
On 02/05/2022 15:45, John Smith wrote:
Why do the JDBC jars need to be on the job manager node though?
On Mon, May 2, 2022 at 9:36 AM Chesnay Schepler <ches...@apache.org>
wrote:
yes.
But if you can ensure that the driver isn't bundled by any
user-jar you can also skip the pattern configuration step.
The pattern looks correct formatting-wise; you could try whether
com.microsoft.sqlserver.jdbc. is enough to solve the issue.
On 02/05/2022 14:41, John Smith wrote:
Oh, so I should copy the jars to the lib folder and
set classloader.parent-first-patterns.additional:
"org.apache.ignite.;com.microsoft.sqlserver.jdbc." to both the
task managers and job managers?
Also is my pattern correct?
"org.apache.ignite.;com.microsoft.sqlserver.jdbc."
Just to be sure I'm running a standalone cluster using zookeeper.
So I have 3 zookeepers, 3 job managers and 3 task managers.
On Mon, May 2, 2022 at 2:57 AM Chesnay Schepler
<ches...@apache.org> wrote:
And you do should make sure that it is set for both processes!
On 02/05/2022 08:43, Chesnay Schepler wrote:
The setting itself isn't taskmanager specific; it applies to
both the job- and taskmanager process.
On 02/05/2022 05:29, John Smith wrote:
Also just to be sure this is a Task Manager setting right?
On Thu, Apr 28, 2022 at 11:13 AM John Smith
<java.dev....@gmail.com> wrote:
I assume you will take action on your side to track and
fix the doc? :)
On Thu, Apr 28, 2022 at 11:12 AM John Smith
<java.dev....@gmail.com> wrote:
Ok so to summarize...
- Build my job jar and have the JDBC driver as a
compile only dependency and copy the JDBC driver to
flink lib folder.
Or
- Build my job jar and include JDBC driver in the
shadow, plus copy the JDBC driver in the flink lib
folder, plus make an entry in config for
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
On Thu, Apr 28, 2022 at 10:17 AM Chesnay Schepler
<ches...@apache.org> wrote:
I think what I meant was "either add it to
/lib, or [if it is already in /lib but also
bundled in the jar] add it to the parent-first
patterns."
On 28/04/2022 15:56, Chesnay Schepler wrote:
Pretty sure, even though I seemingly
documented it incorrectly :)
On 28/04/2022 15:49, John Smith wrote:
You sure?
*
/JDBC/: JDBC drivers leak references
outside the user code classloader. To
ensure that these classes are only loaded
once you should either add the driver
jars to Flink’s |lib/| folder, or add the
driver classes to the list of
parent-first loaded class via
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>.
It says either or
On Wed, Apr 27, 2022 at 3:44 AM Chesnay
Schepler <ches...@apache.org> wrote:
You're misinterpreting the docs.
The parent/child-first classloading
controls where Flink looks for a class
/first/, specifically whether we first
load from /lib or the user-jar.
It does not allow you to load something
from the user-jar in the parent
classloader. That's just not how it works.
It must be in /lib.
On 27/04/2022 04:59, John Smith wrote:
Hi Chesnay as per the docs...
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/
You can either put the jars in task
manager lib folder or use
|classloader.parent-first-patterns-additional|
<https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
I prefer the latter like this: the
dependency stays with the user-jar and
not on the task manager.
On Tue, Apr 26, 2022 at 9:52 PM John
Smith <java.dev....@gmail.com> wrote:
Ok so I should put the Apache ignite
and my Microsoft drivers in the lib
folders of my task managers?
And then in my job jar only include
them as compile time dependencies?
On Tue, Apr 26, 2022 at 10:42 AM
Chesnay Schepler
<ches...@apache.org> wrote:
JDBC drivers are well-known for
leaking classloaders unfortunately.
You have correctly identified
your alternatives.
You must put the jdbc driver
into /lib instead. Setting only
the parent-first pattern
shouldn't affect anything.
That is only relevant if
something is in both in /lib and
the user-jar, telling Flink to
prioritize what is in lib.
On 26/04/2022 15:35, John Smith
wrote:
So I
put
classloader.parent-first-patterns.additional:
"org.apache.ignite." in the
task config and so far I don't
think I'm getting
"java.lang.OutOfMemoryError:
Metaspace" any more.
Or it's too early to tell.
Though now, the task managers
are shutting down due to some
other failures.
So maybe because tasks were
failing and reloading often the
task manager was running out of
Metspace. But now maybe it's
just cleanly shutting down.
On Wed, Apr 20, 2022 at 11:35
AM John Smith
<java.dev....@gmail.com> wrote:
Or I can put in the config
to treat org.apache.ignite.
classes as first class?
On Tue, Apr 19, 2022 at
10:18 PM John Smith
<java.dev....@gmail.com> wrote:
Ok, so I loaded the
dump into Eclipse Mat
and followed:
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
- On the Histogram, I
got over 30 entries
for: ChildFirstClassLoader
- Then I clicked on one
of them "Merge Shortest
Path..." and picked
"Exclude all
phantom/weak/soft
references"
- Which then gave me:
SqlDriverManager >
Apache Ignite JdbcThin
Driver
So i'm
guessing anything JDBC
based. I should copy
into the task manager
libs folder and my jobs
make the dependencies
as compile only?
On Tue, Apr 19, 2022 at
12:18 PM Yaroslav
Tkachenko
<yaros...@goldsky.io>
wrote:
Also
https://shopify.engineering/optimizing-apache-flink-applications-tips
might be helpful
(has a section on
profiling, as well
as classloading).
On Tue, Apr 19,
2022 at 4:35 AM
Chesnay Schepler
<ches...@apache.org>
wrote:
We have a very
rough "guide"
in the wiki
(it's just the
specific steps
I took to debug
another leak):
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
On 19/04/2022
12:01, huweihua
wrote:
Hi, John
Sorry for the
late reply.
You can use
MAT[1] to
analyze the
dump file.
Check whether
have too many
loaded classes.
[1]
https://www.eclipse.org/mat/
2022年4月18日
下午9:55,John
Smith
<java.dev....@gmail.com>
写道:
Hi, can
anyone help
with this? I
never looked
at a dump
file before.
On Thu, Apr
14, 2022 at
11:59 AM John
Smith
<java.dev....@gmail.com>
wrote:
Hi, so I
have a
dump
file.
What do I
look for?
On Thu,
Mar 31,
2022 at
3:28 PM
John
Smith
<java.dev....@gmail.com>
wrote:
Ok so
if
there's
a
leak,
if I
manually stop
the
job
and
restart
it
from
the
UI
multiple
times,
I
won't
see
the issue
because
because
the
classes
are
unloaded
correctly?
On
Thu,
Mar
31,
2022
at
9:20
AM
huweihua
<huweihua....@gmail.com>
wrote:
The
difference
is
that
manually
canceling
the
job
stops
the
JobMaster,
but
automatic
failover
keeps
the
JobMaster
running.
But
looking
on
TaskManager,
it
doesn't
make
much
difference
2022年3月31日
上午4:01,John
Smith
<java.dev....@gmail.com>
写道:
Also
if
I
manually
cancel
and
restart
the
same
job
over
and
over
is
it
the
same
as
if
flink
was
restarting
a
job
due
to
failure?
I.e:
When
I
click
"Cancel
Job"
on
the
UI
is
the
job
completely
unloaded
vs
when
the
job
scheduler
restarts
a
job
because
if
whatever
reason?
Lile
this
I'll
stop
and
restart
the
job
a
few
times
or
maybe
I
can
trick
my
job
to
fail
and
have
the
scheduler
restart
it.
Ok
let
me
think
about
this...
On
Wed,
Mar
30,
2022
at
10:24
AM
胡伟华
<huweihua....@gmail.com>
wrote:
So
if
I
run
the
same
jobs
in
my
dev
env
will
I
still
be
able
to
see
the
similar
dump?
I
think
running
the
same
job
in
dev
should
be
reproducible,
maybe
you
can
have
a
try.
If
not
I
would
have
to
wait
at
a
low
volume
time
to
do
it
on
production.
Aldo
if
I
recall
the
dump
is
as
big
as
the
JVM
memory
right
so
if
I
have
10GB
configed
for
the
JVM
the
dump
will
be
10GB
file?
Yes,
JMAP
will
pause
the
JVM,
the
time
of
pause
depends
on
the
size
to
dump.
you
can
use
"jmap
-dump:live"
to
dump
only
the
reachable
objects,
this
will
take
a
brief
pause
2022年3月30日
下午9:47,John
Smith
<java.dev....@gmail.com>
写道:
I
have
3
task
managers
(see
config
below).
There
is
total
of
10
jobs
with
25
slots
being
used.
The
jobs
are
100%
ETL
I.e;
They
load
Json,
transform
it
and
push
it
to
JDBC,
only
1
job
of
the
10
is
pushing
to
Apache
Ignite
cluster.
FOR
JMAP.
I
know
that
it
will
pause
the
task
manager.
So
if
I
run
the
same
jobs
in
my
dev
env
will
I
still
be
able
to
see
the
similar
dump?
I
I
assume
so.
If
not
I
would
have
to
wait
at
a
low
volume
time
to
do
it
on
production.
Aldo
if
I
recall
the
dump
is
as
big
as
the
JVM
memory
right
so
if
I
have
10GB
configed
for
the
JVM
the
dump
will
be
10GB
file?
#
Operating
system
has
16GB
total.
env.ssh.opts:
-l
flink
-oStrictHostKeyChecking=no
cluster.evenly-spread-out-slots:
true
taskmanager.memory.flink.size:
10240m
taskmanager.memory.jvm-metaspace.size:
2048m
taskmanager.numberOfTaskSlots:
16
parallelism.default:
1
high-availability:
zookeeper
high-availability.storageDir:
file:///mnt/flink/ha/flink_1_14/
high-availability.zookeeper.quorum:
...
high-availability.zookeeper.path.root:
/flink_1_14
high-availability.cluster-id:
/flink_1_14_cluster_0001
web.upload.dir:
/mnt/flink/uploads/flink_1_14
state.backend:
rocksdb
state.backend.incremental:
true
state.checkpoints.dir:
file:///mnt/flink/checkpoints/flink_1_14
state.savepoints.dir:
file:///mnt/flink/savepoints/flink_1_14
On
Wed,
Mar
30,
2022
at
2:16
AM
胡伟华
<huweihua....@gmail.com>
wrote:
Hi,
John
Could
you
tell
us
you
application
scenario?
Is
it
a
flink
session
cluster
with
a
lot
of
jobs?
Maybe
you
can
try
to
dump
the
memory
with
jmap
and
use
tools
such
as
MAT
to
analyze
whether
there
are
abnormal
classes
and
classloaders
>
2022年3月30日
上午6:09,John
Smith
<java.dev....@gmail.com>
写道:
>
>
Hi
running
1.14.4
>
>
My
tasks
manager
still
fails
with
java.lang.OutOfMemoryError:
Metaspace.
The
metaspace
out-of-memory
error
has
occurred.
This
can
mean
two
things:
either
the
job
requires
a
larger
size
of
JVM
metaspace
to
load
classes
or
there
is
a
class
loading
leak.
>
>
I
have
2GB
of
metaspace
configed
taskmanager.memory.jvm-metaspace.size:
2048m
>
>
But
the
task
nodes
still
fail.
>
>
When
looking
at
the
UI
metrics,
the
metaspace
starts
low.
Now
I
see
85%
usage.
It
seems
to
be
a
class
loading
leak
at
this
point,
how
can
we
debug
this
issue?