Why do the JDBC jars need to be on the job manager node though?

On Mon, May 2, 2022 at 9:36 AM Chesnay Schepler <ches...@apache.org> wrote:

> yes.
> But if you can ensure that the driver isn't bundled by any user-jar you
> can also skip the pattern configuration step.
>
> The pattern looks correct formatting-wise; you could try whether
> com.microsoft.sqlserver.jdbc. is enough to solve the issue.
>
> On 02/05/2022 14:41, John Smith wrote:
>
> Oh, so I should copy the jars to the lib folder and
> set classloader.parent-first-patterns.additional:
> "org.apache.ignite.;com.microsoft.sqlserver.jdbc." to both the task
> managers and job managers?
>
> Also is my pattern correct?
> "org.apache.ignite.;com.microsoft.sqlserver.jdbc."
>
> Just to be sure I'm running a standalone cluster using zookeeper. So I
> have 3 zookeepers, 3 job managers and 3 task managers.
>
>
> On Mon, May 2, 2022 at 2:57 AM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> And you do should make sure that it is set for both processes!
>>
>> On 02/05/2022 08:43, Chesnay Schepler wrote:
>>
>> The setting itself isn't taskmanager specific; it applies to both the
>> job- and taskmanager process.
>>
>> On 02/05/2022 05:29, John Smith wrote:
>>
>> Also just to be sure this is a Task Manager setting right?
>>
>> On Thu, Apr 28, 2022 at 11:13 AM John Smith <java.dev....@gmail.com>
>> wrote:
>>
>>> I assume you will take action on your side to track and fix the doc? :)
>>>
>>> On Thu, Apr 28, 2022 at 11:12 AM John Smith <java.dev....@gmail.com>
>>> wrote:
>>>
>>>> Ok so to summarize...
>>>>
>>>> - Build my job jar and have the JDBC driver as a compile only
>>>> dependency and copy the JDBC driver to flink lib folder.
>>>>
>>>> Or
>>>>
>>>> - Build my job jar and include JDBC driver in the shadow, plus copy the
>>>> JDBC driver in the flink lib folder, plus  make an entry in config for
>>>> classloader.parent-first-patterns-additional
>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
>>>>
>>>>
>>>> On Thu, Apr 28, 2022 at 10:17 AM Chesnay Schepler <ches...@apache.org>
>>>> wrote:
>>>>
>>>>> I think what I meant was "either add it to /lib, or [if it is already
>>>>> in /lib but also bundled in the jar] add it to the parent-first patterns."
>>>>>
>>>>> On 28/04/2022 15:56, Chesnay Schepler wrote:
>>>>>
>>>>> Pretty sure, even though I seemingly documented it incorrectly :)
>>>>>
>>>>> On 28/04/2022 15:49, John Smith wrote:
>>>>>
>>>>> You sure?
>>>>>
>>>>>    -
>>>>>
>>>>>    *JDBC*: JDBC drivers leak references outside the user code
>>>>>    classloader. To ensure that these classes are only loaded once you 
>>>>> should
>>>>>    either add the driver jars to Flink’s lib/ folder, or add the
>>>>>    driver classes to the list of parent-first loaded class via
>>>>>    classloader.parent-first-patterns-additional
>>>>>    
>>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
>>>>>    .
>>>>>
>>>>>    It says either or
>>>>>
>>>>>
>>>>> On Wed, Apr 27, 2022 at 3:44 AM Chesnay Schepler <ches...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> You're misinterpreting the docs.
>>>>>>
>>>>>> The parent/child-first classloading controls where Flink looks for a
>>>>>> class *first*, specifically whether we first load from /lib or the
>>>>>> user-jar.
>>>>>> It does not allow you to load something from the user-jar in the
>>>>>> parent classloader. That's just not how it works.
>>>>>>
>>>>>> It must be in /lib.
>>>>>>
>>>>>> On 27/04/2022 04:59, John Smith wrote:
>>>>>>
>>>>>> Hi Chesnay as per the docs...
>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/debugging_classloading/
>>>>>>
>>>>>> You can either put the jars in task manager lib folder or use
>>>>>> classloader.parent-first-patterns-additional
>>>>>> <https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#classloader-parent-first-patterns-additional>
>>>>>>
>>>>>> I prefer the latter like this: the dependency stays with the user-jar
>>>>>> and not on the task manager.
>>>>>>
>>>>>> On Tue, Apr 26, 2022 at 9:52 PM John Smith <java.dev....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok so I should put the Apache ignite and my Microsoft drivers in the
>>>>>>> lib folders of my task managers?
>>>>>>>
>>>>>>> And then in my job jar only include them as compile time
>>>>>>> dependencies?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 26, 2022 at 10:42 AM Chesnay Schepler <
>>>>>>> ches...@apache.org> wrote:
>>>>>>>
>>>>>>>> JDBC drivers are well-known for leaking classloaders unfortunately.
>>>>>>>>
>>>>>>>> You have correctly identified your alternatives.
>>>>>>>>
>>>>>>>> You must put the jdbc driver into /lib instead. Setting only the
>>>>>>>> parent-first pattern shouldn't affect anything.
>>>>>>>> That is only relevant if something is in both in /lib and the
>>>>>>>> user-jar, telling Flink to prioritize what is in lib.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 26/04/2022 15:35, John Smith wrote:
>>>>>>>>
>>>>>>>> So I put classloader.parent-first-patterns.additional:
>>>>>>>> "org.apache.ignite." in the task config and so far I don't think I'm
>>>>>>>> getting "java.lang.OutOfMemoryError: Metaspace" any more.
>>>>>>>>
>>>>>>>> Or it's too early to tell.
>>>>>>>>
>>>>>>>> Though now, the task managers are shutting down due to some
>>>>>>>> other failures.
>>>>>>>>
>>>>>>>> So maybe because tasks were failing and reloading often the task
>>>>>>>> manager was running out of Metspace. But now maybe it's just
>>>>>>>> cleanly shutting down.
>>>>>>>>
>>>>>>>> On Wed, Apr 20, 2022 at 11:35 AM John Smith <java.dev....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Or I can put in the config to treat org.apache.ignite. classes as
>>>>>>>>> first class?
>>>>>>>>>
>>>>>>>>> On Tue, Apr 19, 2022 at 10:18 PM John Smith <
>>>>>>>>> java.dev....@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Ok, so I loaded the dump into Eclipse Mat and followed:
>>>>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
>>>>>>>>>>
>>>>>>>>>> - On the Histogram, I got over 30 entries for:
>>>>>>>>>> ChildFirstClassLoader
>>>>>>>>>> - Then I clicked on one of them "Merge Shortest Path..." and
>>>>>>>>>> picked "Exclude all phantom/weak/soft references"
>>>>>>>>>> - Which then gave me: SqlDriverManager > Apache Ignite JdbcThin
>>>>>>>>>> Driver
>>>>>>>>>>
>>>>>>>>>> So i'm guessing anything JDBC based. I should copy into the task
>>>>>>>>>> manager libs folder and my jobs make the dependencies as compile 
>>>>>>>>>> only?
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 19, 2022 at 12:18 PM Yaroslav Tkachenko <
>>>>>>>>>> yaros...@goldsky.io> wrote:
>>>>>>>>>>
>>>>>>>>>>> Also
>>>>>>>>>>> https://shopify.engineering/optimizing-apache-flink-applications-tips
>>>>>>>>>>> might be helpful (has a section on profiling, as well as 
>>>>>>>>>>> classloading).
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 19, 2022 at 4:35 AM Chesnay Schepler <
>>>>>>>>>>> ches...@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We have a very rough "guide" in the wiki (it's just the
>>>>>>>>>>>> specific steps I took to debug another leak):
>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
>>>>>>>>>>>>
>>>>>>>>>>>> On 19/04/2022 12:01, huweihua wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi, John
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the late reply. You can use MAT[1] to analyze the
>>>>>>>>>>>> dump file. Check whether have too many loaded classes.
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://www.eclipse.org/mat/
>>>>>>>>>>>>
>>>>>>>>>>>> 2022年4月18日 下午9:55,John Smith <java.dev....@gmail.com> 写道:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi, can anyone help with this? I never looked at a dump file
>>>>>>>>>>>> before.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 14, 2022 at 11:59 AM John Smith <
>>>>>>>>>>>> java.dev....@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi, so I have a dump file. What do I look for?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Mar 31, 2022 at 3:28 PM John Smith <
>>>>>>>>>>>>> java.dev....@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok so if there's a leak, if I manually stop the job and
>>>>>>>>>>>>>> restart it from the UI multiple times, I won't see the issue 
>>>>>>>>>>>>>> because
>>>>>>>>>>>>>> because the classes are unloaded correctly?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Mar 31, 2022 at 9:20 AM huweihua <
>>>>>>>>>>>>>> huweihua....@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The difference is that manually canceling the job stops the
>>>>>>>>>>>>>>> JobMaster, but automatic failover keeps the JobMaster running. 
>>>>>>>>>>>>>>> But looking
>>>>>>>>>>>>>>> on TaskManager, it doesn't make much difference
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2022年3月31日 上午4:01,John Smith <java.dev....@gmail.com> 写道:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also if I manually cancel and restart the same job over and
>>>>>>>>>>>>>>> over is it the same as if flink was restarting a job due to 
>>>>>>>>>>>>>>> failure?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I.e: When I click "Cancel Job" on the UI is the job
>>>>>>>>>>>>>>> completely unloaded vs when the job scheduler restarts a job 
>>>>>>>>>>>>>>> because if
>>>>>>>>>>>>>>> whatever reason?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Lile this I'll stop and restart the job a few times or maybe
>>>>>>>>>>>>>>> I can trick my job to fail and have the scheduler restart it. 
>>>>>>>>>>>>>>> Ok let me
>>>>>>>>>>>>>>> think about this...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Mar 30, 2022 at 10:24 AM 胡伟华 <huweihua....@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So if I run the same jobs in my dev env will I still be
>>>>>>>>>>>>>>>> able to see the similar dump?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think running the same job in dev should be reproducible,
>>>>>>>>>>>>>>>> maybe you can have a try.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  If not I would have to wait at a low volume time to do it
>>>>>>>>>>>>>>>> on production. Aldo if I recall the dump is as big as the JVM 
>>>>>>>>>>>>>>>> memory right
>>>>>>>>>>>>>>>> so if I have 10GB configed for the JVM the dump will be 10GB 
>>>>>>>>>>>>>>>> file?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, JMAP will pause the JVM, the time of pause depends on
>>>>>>>>>>>>>>>> the size to dump. you can use "jmap -dump:live" to dump only 
>>>>>>>>>>>>>>>> the reachable
>>>>>>>>>>>>>>>> objects, this will take a brief pause
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2022年3月30日 下午9:47,John Smith <java.dev....@gmail.com> 写道:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have 3 task managers (see config below). There is total
>>>>>>>>>>>>>>>> of 10 jobs with 25 slots being used.
>>>>>>>>>>>>>>>> The jobs are 100% ETL I.e; They load Json, transform it and
>>>>>>>>>>>>>>>> push it to JDBC, only 1 job of the 10 is pushing to Apache 
>>>>>>>>>>>>>>>> Ignite cluster.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> FOR JMAP. I know that it will pause the task manager. So if
>>>>>>>>>>>>>>>> I run the same jobs in my dev env will I still be able to see 
>>>>>>>>>>>>>>>> the similar
>>>>>>>>>>>>>>>> dump? I I assume so. If not I would have to wait at a low 
>>>>>>>>>>>>>>>> volume time to do
>>>>>>>>>>>>>>>> it on production. Aldo if I recall the dump is as big as the 
>>>>>>>>>>>>>>>> JVM memory
>>>>>>>>>>>>>>>> right so if I have 10GB configed for the JVM the dump will be 
>>>>>>>>>>>>>>>> 10GB file?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # Operating system has 16GB total.
>>>>>>>>>>>>>>>> env.ssh.opts: -l flink -oStrictHostKeyChecking=no
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cluster.evenly-spread-out-slots: true
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> taskmanager.memory.flink.size: 10240m
>>>>>>>>>>>>>>>> taskmanager.memory.jvm-metaspace.size: 2048m
>>>>>>>>>>>>>>>> taskmanager.numberOfTaskSlots: 16
>>>>>>>>>>>>>>>> parallelism.default: 1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> high-availability: zookeeper
>>>>>>>>>>>>>>>> high-availability.storageDir:
>>>>>>>>>>>>>>>> file:///mnt/flink/ha/flink_1_14/
>>>>>>>>>>>>>>>> high-availability.zookeeper.quorum: ...
>>>>>>>>>>>>>>>> high-availability.zookeeper.path.root: /flink_1_14
>>>>>>>>>>>>>>>> high-availability.cluster-id: /flink_1_14_cluster_0001
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> web.upload.dir: /mnt/flink/uploads/flink_1_14
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> state.backend: rocksdb
>>>>>>>>>>>>>>>> state.backend.incremental: true
>>>>>>>>>>>>>>>> state.checkpoints.dir:
>>>>>>>>>>>>>>>> file:///mnt/flink/checkpoints/flink_1_14
>>>>>>>>>>>>>>>> state.savepoints.dir:
>>>>>>>>>>>>>>>> file:///mnt/flink/savepoints/flink_1_14
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Mar 30, 2022 at 2:16 AM 胡伟华 <huweihua....@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi, John
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Could you tell us you application scenario? Is it a flink
>>>>>>>>>>>>>>>>> session cluster with a lot of jobs?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maybe you can try to dump the memory with jmap and use
>>>>>>>>>>>>>>>>> tools such as MAT to analyze whether there are abnormal 
>>>>>>>>>>>>>>>>> classes and
>>>>>>>>>>>>>>>>> classloaders
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > 2022年3月30日 上午6:09,John Smith <java.dev....@gmail.com>
>>>>>>>>>>>>>>>>> 写道:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Hi running 1.14.4
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > My tasks manager still fails with
>>>>>>>>>>>>>>>>> java.lang.OutOfMemoryError: Metaspace. The metaspace 
>>>>>>>>>>>>>>>>> out-of-memory error
>>>>>>>>>>>>>>>>> has occurred. This can mean two things: either the job 
>>>>>>>>>>>>>>>>> requires a larger
>>>>>>>>>>>>>>>>> size of JVM metaspace to load classes or there is a class 
>>>>>>>>>>>>>>>>> loading leak.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > I have 2GB of metaspace configed
>>>>>>>>>>>>>>>>> taskmanager.memory.jvm-metaspace.size: 2048m
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > But the task nodes still fail.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > When looking at the UI metrics, the metaspace starts
>>>>>>>>>>>>>>>>> low. Now I see 85% usage. It seems to be a class loading leak 
>>>>>>>>>>>>>>>>> at this
>>>>>>>>>>>>>>>>> point, how can we debug this issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>
>>
>

Reply via email to