Thanks for the reply Dawid. The Flink jobs are deployed in Yarn cluster. I am
seeing the error in Job Manager log for some jobs too frequently. I'm using
Flink 1.4.2. I'm running only Streaming Jobs.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi,
I got the same exception when running in flink cluster. The settings is
below:
flink version: 1.5.4
flink-conf.yaml:
jobmanager.heap.mb: 102400
taskmanager.heap.mb: 102400
taskmanager.numberOfTaskSlots: 40
parallelism.default: 40
I have 5 task manager.
My code just read hbase table data an
Hi,
Could you provide us with some more information? Which version of flink
are you running? In which cluster setup? When does this exception occur?
This exception says that request for status overview (no of
taskmanagers, slots info etc.) failed.
Best,
Dawid
On 31/10/2018 20:05, Anil wrote:
>
Alright. Glad to hear that things are now working :-)
On Tue, Sep 26, 2017 at 9:55 AM, Steven Wu wrote:
> Till, sorry for the confusion. I meant Flink documentation has the correct
> info. our code was mistakenly referring to akka.ask.timeout for death watch.
>
> On Mon, Sep 25, 2017 at 3:52 PM,
Till, sorry for the confusion. I meant Flink documentation has the correct
info. our code was mistakenly referring to akka.ask.timeout for death watch.
On Mon, Sep 25, 2017 at 3:52 PM, Till Rohrmann wrote:
> Quick question Steven. Where did you find the documentation concerning
> that the death
Quick question Steven. Where did you find the documentation concerning that
the death watch interval is linke to the akka ask timeout? It was included
in the past, but I couldn't find it anymore.
Cheers,
Till
On Mon, Sep 25, 2017 at 9:47 AM, Till Rohrmann wrote:
> Great to hear that you could f
Great to hear that you could figure things out Steven.
You are right. The death watch is no longer linked to the akka ask timeout,
because of FLINK-6495. Thanks for the feedback. I will correct the
documentation.
Cheers,
Till
On Sat, Sep 23, 2017 at 10:24 AM, Steven Wu wrote:
> just to close t
just to close the thread. akka death watch was triggered by high GC pause,
which is caused by memory leak in our code during Flink job restart.
noted that akka.ask.timeout wasn't related to akka death watch, which Flink
has documented and linked.
On Sat, Aug 26, 2017 at 10:58 AM, Steven Wu wrote
this is a stateless job. so we don't use RocksDB.
yeah. network can also be a possibility. will keep it in the radar.
unfortunately, our metrics system don't have the tcp metrics when running
inside containers.
On Fri, Aug 25, 2017 at 2:09 PM, Robert Metzger wrote:
> Hi,
> are you using the Roc
Hi,
are you using the RocksDB state backend already?
Maybe writing the state to disk would actually reduce the pressure on the
GC (but of course it'll also reduce throughput a bit).
Are there any known issues with the network? Maybe the network bursts on
restart cause the timeouts?
On Fri, Aug 25
Bowen,
Heap size is ~50G. CPU was actually pretty low (like <20%) when high GC
pause and akka timeout was happening. So maybe memory allocation and GC
wasn't really an issue. I also recently learned that JVM can pause for
writing to GC log for disk I/O. that is another lead I am pursuing.
Thanks,
Hi Steven,
Yes, GC is a big overhead, it may cause your CPU utilization to reach
100%, and every process stopped working. We ran into this a while too.
How much memory did you assign to TaskManager? How much the your CPU
utilization when your taskmanager is considered 'killed'?
Bowen
O
Till,
Once our job was restarted for some reason (e.g. taskmangaer container got
killed), it can stuck in continuous restart loop for hours. Right now, I
suspect it is caused by GC pause during restart, our job has very high
memory allocation in steady state. High GC pause then caused akka timeout
Hi Steven,
quick correction for Flink 1.2. Indeed the MetricFetcher does not pick up
the right timeout value from the configuration. Instead it uses a hardcoded
10s timeout. This has only been changed recently and is already committed
in the master. So with the next release 1.4 it will properly pi
Till/Chesnay, thanks for the answers. Look like this is a result/symptom of
underline stability issue that I am trying to track down.
It is Flink 1.2.
On Fri, Aug 18, 2017 at 12:24 AM, Chesnay Schepler
wrote:
> The MetricFetcher always use the default akka timeout value.
>
>
> On 18.08.2017 09:
The MetricFetcher always use the default akka timeout value.
On 18.08.2017 09:07, Till Rohrmann wrote:
Hi Steven,
I thought that the MetricFetcher picks up the right timeout from the
configuration. Which version of Flink are you using?
The timeout is not a critical problem for the job health
Hi Steven,
I thought that the MetricFetcher picks up the right timeout from the
configuration. Which version of Flink are you using?
The timeout is not a critical problem for the job health.
Cheers,
Till
On Fri, Aug 18, 2017 at 7:22 AM, Steven Wu wrote:
>
> We have set akka.ask.timeout to 60
Yes, the issue I mentioned is solved in the next release.
On Tue, May 5, 2015 at 12:56 AM, Flavio Pompermaier
wrote:
> Thanks for the support! so this issue will be solved in the next 0.9
> release?
>
> On Tue, May 5, 2015 at 12:17 AM, Stephan Ewen wrote:
>
>> Here is a list of all values you c
Thanks for the support! so this issue will be solved in the next 0.9
release?
On Tue, May 5, 2015 at 12:17 AM, Stephan Ewen wrote:
> Here is a list of all values you can set:
> http://ci.apache.org/projects/flink/flink-docs-master/setup/config.html
>
> On Tue, May 5, 2015 at 12:17 AM, Stephan Ew
Here is a list of all values you can set:
http://ci.apache.org/projects/flink/flink-docs-master/setup/config.html
On Tue, May 5, 2015 at 12:17 AM, Stephan Ewen wrote:
> Hi Flavio!
>
> This may be a known and fixed issue. It relates to the fact that task
> deployment may take long in case of big
Hi Flavio!
This may be a known and fixed issue. It relates to the fact that task
deployment may take long in case of big jar files. The current master
should not have this issue any more, but 0.9-SNAPSHOT has it.
As a temporary workaround, you can increase "akka.ask.timeout"in the flink
configura
21 matches
Mail list logo