Re: Timers and Checkpoints

2018-05-25 Thread Alberto Mancini
Hello Timo, we found that the problem was not related to a timer but to an hardware issue in the production system. On the other hand the NPE exception in HeapInternalTimerService in the testing system was due to the fact the savepoint was created by a different version of the application; definite

Re: [ANNOUNCE] Apache Flink 1.5.0 release

2018-05-25 Thread Till Rohrmann
Quick update: I had to update the date of the release blog post which also changed the URL. It can now be found here: http://flink.apache.org/news/2018/05/25/release-1.5.0.html Cheers, Till On Fri, May 25, 2018 at 7:03 PM, Hao Sun wrote: > This is great. Thanks for the effort to get this out!

Re: [ANNOUNCE] Apache Flink 1.5.0 release

2018-05-25 Thread Hao Sun
This is great. Thanks for the effort to get this out! On Fri, May 25, 2018 at 9:47 AM Till Rohrmann wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.5.0. > > Apache Flink® is an open-source stream processing framework for > distributed, high-performin

Re: latency critical job

2018-05-25 Thread Rong Rong
Hi Makeyang, +1 on Timo's point. We have been dealing with these kind of problems before and in general Flink can handle latency if implemented correctly and assigned correct amount of computation resource (depending on what kind of resource isolation/containerization you are doing) to handle addi

[ANNOUNCE] Apache Flink 1.5.0 release

2018-05-25 Thread Till Rohrmann
The Apache Flink community is very happy to announce the release of Apache Flink 1.5.0. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at: https://flin

Re: Task Manager detached under load

2018-05-25 Thread Steven Wu
Till, thanks for the follow-up. looking forward to 1.5 :) On Fri, May 25, 2018 at 2:11 AM, Till Rohrmann wrote: > Hi Steven, > > we don't have `jobmanager.exit-on-fatal-akka-error` because then the JM > would also be killed if a single TM gets quarantined. This is also not a > desired behaviour.

Execution model in Flink

2018-05-25 Thread Esa Heikkinen
Hi I don't know whether this question is suitable for this forum, but I take the risk and ask :) In my understanding the execution model in Flink is very data (flow) stream oriented and specific. It is difficult to build a control flow logic (like state-machine) outside of the stream specific

Re: latency critical job

2018-05-25 Thread Timo Walther
Hi, usually Flink should have constant latency if the job is implemented correctly. But if you want to implement something like an external monitoring process., you can use the REST API [1] and metrics [2] to model such an behavior by restarting your application. In theory, you could also imp

Re: Timers and Checkpoints

2018-05-25 Thread Alberto Mancini
Hello Timo, thanks for the response. We are still investigating in the production system but in test we get now this exception that seems very much related to the issue 6291. java.lang.Exception: Could not perform checkpoint 13468 for operator Aggregator -> Sink: HBase (1/1). at org.ap

Re: Timers and Checkpoints

2018-05-25 Thread Timo Walther
Hi Alberto, do you get exactly the same exception? Maybe you can share some logs with us? Regards, Timo Am 25.05.18 um 13:41 schrieb Alberto Mancini: Hello, I think we are experiencing this issue: https://issues.apache.org/jira/browse/FLINK-6291 In fact we have a long running job that is un

Timers and Checkpoints

2018-05-25 Thread Alberto Mancini
Hello, I think we are experiencing this issue: https://issues.apache.org/jira/browse/FLINK-6291 In fact we have a long running job that is unable to complete a checkpoint and so we are unable to create a savepoint. I do not really understand from 6291 how the timer service has been removed in my

Re: does Flink call FullGC to reclaim direct memory mainly occupied by RocksDB

2018-05-25 Thread sihua zhou
Hi, I think each time when canceling the job, flink will close the RocksDB to release the resource held by it. You can find this in RocksDBKeyedStateBackend. Best, Sihua On 05/25/2018 19:27, makeyang wrote: each time when cancel Job does Flink call FullGC to reclaim direct memory mainly occup

does Flink call FullGC to reclaim direct memory mainly occupied by RocksDB

2018-05-25 Thread makeyang
each time when cancel Job does Flink call FullGC to reclaim direct memory mainly occupied by RocksDB? if so, where does this? if not, why? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Akka Http used in custom RichSourceFunction

2018-05-25 Thread Piotr Nowojski
Thanks for getting back and I’m glad that you were able to resolve your issue :) Piotrek > On 25 May 2018, at 11:25, Niels van Kaam wrote: > > Hi, > > It was indeed the problem, and shading my akka dependency has solved the > problem. Thank you for pointing that out! > > For references: > >

Re: Akka Http used in custom RichSourceFunction

2018-05-25 Thread Niels van Kaam
Hi, It was indeed the problem, and shading my akka dependency has solved the problem. Thank you for pointing that out! For references: When shading akka you also need to merge the reference.conf files from akka, or it will fail. This page contains useful documentation on how to shade akka: https

Re: Task Manager detached under load

2018-05-25 Thread Till Rohrmann
Hi Steven, we don't have `jobmanager.exit-on-fatal-akka-error` because then the JM would also be killed if a single TM gets quarantined. This is also not a desired behaviour. With Flink 1.5 the problem with quarantining should be gone since we don't rely anymore on Akka's death watch and instead

Re: Akka Http used in custom RichSourceFunction

2018-05-25 Thread Piotr Nowojski
Hi, Yes, this might be the cause of the issue, because indeed it looks like your akka’s version is leaking to Flink’s classloader. Piotrek > On 25 May 2018, at 09:40, Niels van Kaam wrote: > > Hi Piotrek, > > Thank you for your response! > > I am currently just testing the job in a local en

Re: Kryo Exception

2018-05-25 Thread Stefan Richter
I agree, it looks like one of the two mentioned issues. > Am 25.05.2018 um 06:15 schrieb sihua zhou : > > Hi Gordon, > > I think this might not be caused by > https://issues.apache.org/jira/browse/FLINK-9263 > , the bug in FLINK-9263 > should

Re: Multiple stream operator watermark handling

2018-05-25 Thread Piotr Nowojski
Great to hear that this worked out for you :) Progression of watermarks on an empty stream is a known issue, that we are working on to resolve in the future. Usually recommended workarounds are to send a custom blank event (which should be ignored) once a while. I have expanded the documentatio

latency critical job

2018-05-25 Thread makeyang
some job is latency critical job which means it can't accept certain threadhold of latency so will flink provide timeout operator in near future which means when one operator timeout, the jobmanager will schedule a new operator which starts from previous state of the OP and keep dealing with new ev

Re: Akka Http used in custom RichSourceFunction

2018-05-25 Thread Niels van Kaam
Hi Piotrek, Thank you for your response! I am currently just testing the job in a local environment. I think that means all classes are in the Java classpath, which might also be the issue then. If I am correct that means I am currently not using dynamic classloading and just overwriting the Akka