https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/mem_setup.html
Thank you~ Xintong Song On Fri, May 1, 2020 at 8:35 AM shao.hongxiao <[email protected]> wrote: > 你好,宋 > Please refer to this document [1] for more details > 能发一下具体链接吗,我也发现flink ui上显示的内存参数不太对,我想仔细看一下相关说明 > > 谢谢啦 > > > > > | | > 邵红晓 > | > | > 邮箱:[email protected] > | > > 签名由 网易邮箱大师 定制 > > On 04/30/2020 12:08, Xintong Song wrote: > Then I would suggest the following. > - Check the task manager log to see if the '-D' properties are properly > loaded. They should be located at the beginning of the log file. > - You can also try to log into the pod and check the JVM launch command > with "ps -ef | grep TaskManagerRunner". I suspect there might be some > argument passing problem regarding the spaces and double quotation marks. > > > > > > Thank you~ > > Xintong Song > > > > > > On Thu, Apr 30, 2020 at 11:39 AM Eleanore Jin <[email protected]> > wrote: > > Hi Xintong, > > > Thanks for the detailed explanation! > > > as for the 2nd question: I mount it to am emptyDir, I assume pod restart > will not cause the pod to be rescheduled to another node, so it should > stay? I verified by directly adding this to the flink-conf.yaml, which I > see the heap dump is taken and stays in the directory: env.java.opts: > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps > > > In addition, I also don't see the log print out something like: Heap dump > file created [5220997112 bytes in 73.464 secs], which I see when directly > adding the options in the flink-conf.yaml > > > containers: > > - volumeMounts: > > - mountPath: /dumps > > name: heap-dumps > > volumes: > > - emptyDir: {} > > name: heap-dumps > > > > > Thanks a lot! > > Eleanore > > > > On Wed, Apr 29, 2020 at 7:55 PM Xintong Song <[email protected]> > wrote: > > Hi Eleanore, > > > I'd like to explain about 1 & 2. For 3, I have no idea either. > > > > 1. I dont see the heap size from UI for task manager show correctly > > > > Despite the 'heap' in the key, 'taskmanager.heap.size' accounts for the > total memory of a Flink task manager, rather than only the heap memory. A > Flink task manager process consumes not only java heap memory, but also > direct memory (e.g., network buffers) and native memory (e.g., JVM > overhead). That's why the JVM heap size shown on the UI is much smaller > than the configured 'taskmanager.heap.size'. Please refer to this document > [1] for more details. This document comes from Flink 1.9 and has not been > back-ported to 1.8, but the contents should apply to 1.8 as well. > > > 2. I dont see the heap dump file in the restarted pod /dumps/oom.bin, did > I set the java opts wrong? > > > > The java options look good to me. It the configured path '/dumps/oom.bin' > a local path of the pod or a path of the host mounted onto the pod? The > restarted pod is a completely new different pod. Everything you write to > the old pod goes away as the pod terminated, unless they are written to the > host through mounted storage. > > > > Thank you~ > > Xintong Song > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/mem_setup.html > > > On Thu, Apr 30, 2020 at 7:41 AM Eleanore Jin <[email protected]> > wrote: > > Hi All, > > > > Currently I am running a flink job cluster (v1.8.2) on kubernetes with 4 > pods, each pod with 4 parallelism. > > > The flink job reads from a source topic with 96 partitions, and does per > element filter, the filtered value comes from a broadcast topic and it > always use the latest message as the filter criteria, then publish to a > sink topic. > > > There is no checkpointing and state involved. > > > Then I am seeing GC overhead limit exceeded error continuously and the > pods keep on restarting > > > So I tried to increase the heap size for task manager by > > containers: > > - args: > > - task-manager > > - -Djobmanager.rpc.address=service-job-manager > > - -Dtaskmanager.heap.size=4096m > > - -Denv.java.opts.taskmanager="-XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/dumps/oom.bin" > > > > > 3 things I noticed, > > > > > 1. I dont see the heap size from UI for task manager show correctly > > > > > > 2. I dont see the heap dump file in the restarted pod /dumps/oom.bin, did > I set the java opts wrong? > > > 3. I continously seeing below logs from all pods, not sure if causes any > issue > {"@timestamp":"2020-04-29T23:39:43.387Z","@version":"1","message":"[Consumer > clientId=consumer-1, groupId=aba774bc] Node 6 was unable to process the > fetch request with (sessionId=2054451921, epoch=474): > FETCH_SESSION_ID_NOT_FOUND.","logger_name":"org.apache.kafka.clients.FetchSessionHandler","thread_name":"pool-6-thread-1","level":"INFO","level_value":20000} > > > Thanks a lot for any help! > > > Best, > Eleanore
