Re: CodeCache is full - Issues with job deployments

2018-12-13 Thread Stefan Richter
Hi,

Thanks for analyzing the problem. If it turns out that there is a problem with 
the termination of the Kafka sources, could you please open an issue for that 
with your results?

Best,
Stefan

> On 11. Dec 2018, at 19:04, PedroMrChaves  wrote:
> 
> Hello Stefan,
> 
> Thank you for the reply.
> 
> I've taken a heap dump from a development cluster using jmap and analysed
> it. To do the tests we restarted the cluster and then left a job running for
> a few minutes. After that, we restarted the job a couple of times and
> stopped it. After leaving the cluster with no running jobs for 20 min we
> toke a heap dump.
> 
> We've found out that a thread which consumes data from kafka was still
> running with a lot of finalizer calls as depicted bellow. 
> 
> 
> 
>  
> 
> I will deploy a job without a Kafka consumer to see if the code cache still
> increases  (all of our cluster have problems with the code cache,
> coincidentally all of the deployed jobs read from kafka).
> 
> 
> Best Regards,
> Pedro Chaves
> 
> 
> 
> -
> Best Regards,
> Pedro Chaves
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



Re: CodeCache is full - Issues with job deployments

2018-12-11 Thread PedroMrChaves
Hello Stefan,

Thank you for the reply.

I've taken a heap dump from a development cluster using jmap and analysed
it. To do the tests we restarted the cluster and then left a job running for
a few minutes. After that, we restarted the job a couple of times and
stopped it. After leaving the cluster with no running jobs for 20 min we
toke a heap dump.

We've found out that a thread which consumes data from kafka was still
running with a lot of finalizer calls as depicted bellow. 



 

I will deploy a job without a Kafka consumer to see if the code cache still
increases  (all of our cluster have problems with the code cache,
coincidentally all of the deployed jobs read from kafka).


Best Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: CodeCache is full - Issues with job deployments

2018-12-11 Thread Stefan Richter
Hi,

in general, Flink uses user-code class loader for job specific code and the 
lifecycle of the class loader should end with the job. This usually means that 
job related code could be removed after the job is finished. However, objects 
of a class that was loaded by the user-code class loader should no longer be 
referenced from anywhere after the job finished or else the user-code class 
loader cannot be freed. If that is the case depends on the user code and the 
used dependencies, e.g. the user code might register some objects somewhere and 
does not remove them by the end of the job. This would prevent freeing the 
user-code and result in a leak. To figure out the root cause, you can take can 
analyse a heap dump for leaking class loaders, e.g. [1] and other sources on 
the web go deeper into this topic.

Best,
Stefan

[1] http://java.jiderhamn.se/category/classloader-leaks/ 


> On 11. Dec 2018, at 12:56, PedroMrChaves  wrote:
> 
> Hello,
> 
> Every time I deploy a flink job the code cache increases, which is expected.
> However, when I stop and start the job or it restarts the code cache
> continuous to increase.
> 
> Screenshot_2018-12-11_at_11.png
> 
>   
> 
> 
> I've added the flags "-XX:+PrintCompilation -XX:ReservedCodeCacheSize=350m
> -XX:-UseCodeCacheFlushing" to Flink taskmanagers and jobmanagers, but the
> cache doesn't decrease very much, as it is depicted in the screenshot above.
> Even if I stop all the jobs, the cache doesn't decrease. 
> 
> This gets to a point where I get the error "CodeCache is full. Compiler has
> been disabled".
> 
> I've attached the taskmanagers output with the "XX:+PrintCompilation" flag
> activated.
> 
> flink-flink-taskexecutor.out
> 
>   
> 
> Flink: 1.6.2
> Java:  openjdk version "1.8.0_191"
> 
> Best Regards,
> Pedro Chaves.
> 
> 
> 
> 
> -
> Best Regards,
> Pedro Chaves
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



CodeCache is full - Issues with job deployments

2018-12-11 Thread PedroMrChaves
Hello,

Every time I deploy a flink job the code cache increases, which is expected.
However, when I stop and start the job or it restarts the code cache
continuous to increase.

Screenshot_2018-12-11_at_11.png

  


I've added the flags "-XX:+PrintCompilation -XX:ReservedCodeCacheSize=350m
-XX:-UseCodeCacheFlushing" to Flink taskmanagers and jobmanagers, but the
cache doesn't decrease very much, as it is depicted in the screenshot above.
Even if I stop all the jobs, the cache doesn't decrease. 

This gets to a point where I get the error "CodeCache is full. Compiler has
been disabled".

I've attached the taskmanagers output with the "XX:+PrintCompilation" flag
activated.

flink-flink-taskexecutor.out

  

Flink: 1.6.2
Java:  openjdk version "1.8.0_191"

Best Regards,
Pedro Chaves.




-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/