[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311498#comment-17311498 ] Yun Tang commented on FLINK-18712: -- The newly attached demo package is what [~lio_sy] gave for me at that time to reproduce this problem, and I forgot to upload it here to make this example public. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Labels: usability > Attachments: flink-demo-master.tgz > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309526#comment-17309526 ] Yu Li commented on FLINK-18712: --- [~yunta] Maybe some description of the newly attached demo package? Thanks. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Labels: usability > Attachments: flink-demo-master.tgz > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234512#comment-17234512 ] Yun Tang commented on FLINK-18712: -- Sorry for late reply, already close this ticket and focus on FLINK-19125 > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Labels: usability > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233509#comment-17233509 ] Robert Metzger commented on FLINK-18712: If nobody objects in 24 hours, I'll remove this ticket from the 1.12 release. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Labels: usability > Fix For: 1.12.0 > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229687#comment-17229687 ] Dian Fu commented on FLINK-18712: - [~yunta] Does it make sense to close this issue and track the docker image improvement in FLINK-19125? > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Fix For: 1.12.0 > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215313#comment-17215313 ] Yun Tang commented on FLINK-18712: -- [~Caesar] Jemalloc could help resolve memory fragmentation. If you could benefit after applying my approach, I think that was the same bug. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Fix For: 1.12.0 > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215118#comment-17215118 ] Julius Michaelis commented on FLINK-18712: -- [~yunta] , may I ask: are you still working on this? If so, can I ask you to check that this bug indeed doesn't appear with {{state.backend.rocksdb.memory.managed: false}}? I'd like to make sure that the bug I'm seeing is really the same bug. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Fix For: 1.12.0 > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194177#comment-17194177 ] Farnight commented on FLINK-18712: -- [~yunta] thanks a lot! we use more general solution by rebuilding the image to install {{libjemalloc-dev}} and add the {{libjemalloc.so}} it to {{LD_PRELOAD}} in k8s yaml for taskmanagers. So far, it works well. Thanks again! > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Assignee: Yun Tang >Priority: Critical > Fix For: 1.12.0 > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189138#comment-17189138 ] Yun Tang commented on FLINK-18712: -- I use [k8s session cluster|https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#deploy-session-cluster] with flink-1.11.1 image to reproduce this problem. The root cause is the issue of memory fragmentation with {{glibc}}. You can refer to https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc and https://sourceware.org/bugzilla/show_bug.cgi?id=15321 for more information. There existed several solutions to fix this: * Quick but not very clean solution to limit the memory pool of glibc, limit {{MALLOC_ARENA_MAX}} to {{2}} in the environment of k8s yaml for taskmanagers. {code:java} env: - name: MALLOC_ARENA_MAX value: "2" {code} You could refer to https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max for more details. * More general solution by rebuilding the image to install {{libjemalloc-dev}} and add the {{libjemalloc.so}} it to {{LD_PRELOAD}} in k8s yaml for taskmanagers. I did not try tcmalloc, which might work as well. I tried both of the above solutions and they worked quite well without endless memory usage growth. I'll create another ticket to track the solution for this issue. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Priority: Critical > Fix For: 1.12.0 > > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186516#comment-17186516 ] Yun Tang commented on FLINK-18712: -- Some updates: I have contacted with [~lio_sy] offline to confirm the steps to reproduce the problem and I'm still debugging. I'll give comments once I have any findings. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Priority: Critical > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186505#comment-17186505 ] Julius Michaelis commented on FLINK-18712: -- @Farnight, have you tried setting {{state.backend.rocksdb.memory.managed: false}}, and checked whether it stops that behavior? (I think I'm seeing something quite similar, I'll try to build a small reproducer next week.) > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Priority: Critical > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166322#comment-17166322 ] Farnight commented on FLINK-18712: -- [~yunta], below is some testing information based on simple job. Please help check. thanks a lot! Flink configs: for flink cluster, we use session-cluster mode. version: 1.10 TM configs: state.backend.rocksdb.memory.managed set to `true` our k8s pod has 31G memory. managed memory set to 10G. heap size set to 15G other settings keep the default. Job: # write a dummy source function to emit events in a for/while loop # use the default SessionWindow with gap 30 minutes. # run the job few times # monitor the k8s pod memory working set usage by cadvisor case 1: when running job on k8s (jm/tm inside a pod container). the memory working set keep raising, although the job is stopped, but working set doesn't decrease. eventually the tm process will be killed by oom-killer. and tm process will be restart(pid changed). then the memory working set got reset. case 2: when running job in my local machine(macbook pro) without k8s env. it doesn't have this issue. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Priority: Critical > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165422#comment-17165422 ] Farnight commented on FLINK-18712: -- Thanks a lot [~yunta]! We are trying a simple job ( remove all business related), and do the testing to reproduce this. We share more detail after the testing. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Priority: Critical > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue
[ https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165194#comment-17165194 ] Yun Tang commented on FLINK-18712: -- [~lio_sy], do you have a simple job to reproduce this and what happened if we move this job to YARN as it could also kill container once memory exceed the memory limit. I ask this is because I just wonder whether k8s would take os cache into account as the memory usage for that pod. To know how much memory used in RocksDB, there existed two ways: # Turn on block cache memory usage metrics: [https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-metrics-block-cache-usage] when managed memory for RocksDB is enabled. And remember that all rocksDB instances within one slot would report the same value. # Use jemalloc and jeprof to see how much memory allocated from os by RocksDB, this is much more precious than 1st solution, and you could refer to [https://github.com/jeffgriffith/native-jvm-leaks/#going-native-with-jemalloc] to re-build and pre load related ".so" file. For flink, you could refer to [https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#forwarding-environment-variables] to know how to pass environment variables {{LD_PRELOAD and }}{{MALLOC_CONF}}. I think by doing this, you could know whether rocksDB has used too much unexpected memory. > Flink RocksDB statebackend memory leak issue > - > > Key: FLINK-18712 > URL: https://issues.apache.org/jira/browse/FLINK-18712 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.10.0 >Reporter: Farnight >Priority: Critical > > When using RocksDB as our statebackend, we found it will lead to memory leak > when restarting job (manually or in recovery case). > > How to reproduce: > # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and > reproduce. > # start a job using RocksDB statebackend. > # when the RocksDB blockcache reachs maximum size, restart the job. and > monitor the memory usage (k8s pod working set) of the TM. > # go through step 2-3 few more times. and memory will keep raising. > > Any solution or suggestion for this? Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)