Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread 周思华
Hi Lasse, I met that before. I think maybe the non-heap memory trend of the graph you attached is the "expected" result ... Because rocksdb will keep the a "filter (bloom filter)" in memory for every opened sst file by default, and the num of the sst file will increase by time, so it looks

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Ted Yu
Please see the last comment on this issue: https://github.com/facebook/rocksdb/issues/3216 FYI On Tue, Apr 10, 2018 at 12:25 AM, Lasse Nedergaard < lassenederga...@gmail.com> wrote: > > This graph shows Non-Heap . If the same pattern exists it make sense that > it will try to allocate more

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Lasse Nedergaard
> Date: 4/10/18 12:25 AM (GMT-08:00) > To: Ken Krugler <kkrugler_li...@transpac.com> > Cc: user <user@flink.apache.org>, Chesnay Schepler <ches...@apache.org> > Subject: Re: java.lang.Exception: TaskManager was lost/killed > > > This graph shows Non-Heap . If the s

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Ted Yu
ay Schepler <ches...@apache.org> Subject: Re: java.lang.Exception: TaskManager was lost/killed This graph shows Non-Heap . If the same pattern exists it make sense that it will try to allocate more memory and then exceed the limit. I can see the trend for all other containers tha

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Lasse Nedergaard
This graph shows Non-Heap . If the same pattern exists it make sense that it will try to allocate more memory and then exceed the limit. I can see the trend for all other containers that has been killed. So my question is now, what is using non-heap memory? From

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-10 Thread Lasse Nedergaard
Hi. I found the exception attached below, for our simple job. It states that our task-manager was killed du to exceed memory limit on 2.7GB. But when I look at Flink metricts just 30 sec before it use 1.3 GB heap and 712 MB Non-Heap around 2 GB. So something else are also using memory inside the

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Ken Krugler
Hi Chesnay, Don’t know if this helps, but I’d run into this as well, though I haven’t hooked up YourKit to analyze exactly what’s causing the memory problem. E.g. after about 3.5 hours running locally, it failed with memory issues. In the TaskManager logs, I start seeing exceptions in my

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Hao Sun
Same story here, 1.3.2 on K8s. Very hard to find reasons on why a TM is killed. Not likely caused by memory leak. If there is a logger I have turn on please let me know. On Mon, Apr 9, 2018, 13:41 Lasse Nedergaard wrote: > We see the same running 1.4.2 on Yarn hosted

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Lasse Nedergaard
We see the same running 1.4.2 on Yarn hosted on Aws EMR cluster. The only thing I can find in the logs from are SIGTERM with the code 15 or -100. Today our simple job reading from Kinesis and writing to Cassandra was killed. The other day in another job I identified a map state.remove command

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Chesnay Schepler
We will need more information to offer any solution. The exception simply means that a TaskManager shut down, for which there are a myriad of possible explanations. Please have a look at the TaskManager logs, they may contain a hint as to why it shut down. On 09.04.2018 16:01, Javier Lopez

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Javier Lopez
Hi, "are you moving the job jar to the ~/flink-1.4.2/lib path ? " -> Yes, to every node in the cluster. On 9 April 2018 at 15:37, miki haiat wrote: > Javier > "adding the jar file to the /lib path of every task manager" > are you moving the job jar to the*

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread miki haiat
Javier "adding the jar file to the /lib path of every task manager" are you moving the job jar to the* ~/flink-1.4.2/lib path* ? On Mon, Apr 9, 2018 at 12:23 PM, Javier Lopez wrote: > Hi, > > We had the same metaspace problem, it was solved by adding the jar file to >

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Javier Lopez
Hi, We had the same metaspace problem, it was solved by adding the jar file to the /lib path of every task manager, as explained here https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html#avoiding-dynamic-classloading. As well we added these java

Re: Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Alexander Smirnov
I've seen similar problem, but it was not a heap size, but Metaspace. It was caused by a job restarting in a loop. Looks like for each restart, Flink loads new instance of classes and very soon in runs out of metaspace. I've created a JIRA issue for this problem, but got no response from the

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-08 Thread TechnoMage
I have seen this when my task manager ran out of RAM. Increase the heap size. flink-conf.yaml: taskmanager.heap.mb jobmanager.heap.mb Michael > On Apr 8, 2018, at 2:36 AM, 王凯 wrote: > > > hi all, recently, i found a problem,it runs well when