Hi. We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos. If you correlate the none heap memory together with job restart you will see none heap increases for every restart until you get an OOM.
I let you know if/when I know how to handle the problem. Med venlig hilsen / Best regards Lasse Nedergaard > Den 3. sep. 2018 kl. 10.08 skrev 祁明良 <[email protected]>: > > Hi All, > > We are running flink(version 1.5.2) on k8s with rocksdb backend. > Each time when the job is cancelled and restarted, we face OOMKilled problem > from the container. > In our case, we only assign 15% of container memory to JVM and leave others > to rocksdb. > To us, it looks like memory used by rocksdb is not released after job > cancelling. Anyone can gives some suggestions? > Currently our tmp fix is to restart the TM pod for each job cancelling, but > it has to be manually. > > Regards, > Mingliang > > 本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! > This communication may contain privileged or other confidential information > of Red. If you have received it in error, please advise the sender by reply > e-mail and immediately delete the message and any attachments without copying > or disclosing the contents. Thank you.
