Re: Question about concurrent checkpoints

2017-09-21 Thread Nico Kruber
On Thursday, 21 September 2017 20:08:01 CEST Narendra Joshi wrote: > Nico Kruber writes: > > according to [1], even with asynchronous state snapshots (see [2]), a > > checkpoint is only complete after all sinks have received the barriers and > > all (asynchronous)

Re: Question about concurrent checkpoints

2017-09-21 Thread Narendra Joshi
Nico Kruber writes: > Hi Narendra, > according to [1], even with asynchronous state snapshots (see [2]), a > checkpoint is only complete after all sinks have received the barriers and > all > (asynchronous) snapshots have been processed. Since, if the number of >

Re: [DISCUSS] Dropping Scala 2.10

2017-09-21 Thread Federico D'Ambrosio
Ok, thank you all for the clarification. @Stephan: I'm using Kafka 0.10, I guess the problem I had then was actually unrelated to specific Kafka version Federico D'Ambrosio Il 21 set 2017 16:30, "Stephan Ewen" ha scritto: > +1 > > @ Frederico: I think Aljoscha is right -

Re: [DISCUSS] Dropping Scala 2.10

2017-09-21 Thread Stephan Ewen
+1 @ Frederico: I think Aljoscha is right - Flink only executes Kafka client code, which is Scala independent from 0.9 on. Do you use Kafka 0.8 still? On Wed, Sep 20, 2017 at 10:00 PM, Aljoscha Krettek wrote: > Hi Federico, > > As far as I know, the Kafka client code has

Re: Task Manager was lost/killed due to full GC

2017-09-21 Thread Stephan Ewen
Hi! The garbage collection stats actually look okay, not terribly bad - almost surprised that this seems to cause failures. Can you check whether you find messages in the TM / JM log about heartbeat timeouts, actor systems being "gated" or "quarantined"? Would also be interesting to know how

Re: StreamCorruptedException

2017-09-21 Thread Tzu-Li (Gordon) Tai
Hi Sridhar, Sorry that this didn't get a response earlier. According to the trace, it seems like the job failed during the process, and when trying to automatically restore from a checkpoint, deserialization of a CEP `IterativeCondition` object failed. As far as I can tell, CEP operators are

Re: Savepoints and migrating value state data types

2017-09-21 Thread Tzu-Li (Gordon) Tai
Hi! The exception that you have bumped into indicates that on the restore of the savepoint, the serializer for that registered state in the savepoint no longer exists. This prevents restoring savepoints taken with memory state backends because there will be no serializer available to deserialize

Re: Savepoints and migrating value state data types

2017-09-21 Thread Nico Kruber
Hi Marc, I assume you have set a UID for your CoProcessFunction as described in [1]? Also, can you provide the Flink version you are working with and the serializer you are using? If you have the UID set, your strategy seems to be the same as proposed by [2]: "Although it is not possible to

Re: on Wikipedia Edit Stream example

2017-09-21 Thread Nico Kruber
Hi Haibin, if you execute the program as in the Wiki edit example [1] from mvn as given or from the IDE, a local Flink environment will be set up which is not accessible form the outside by default. This is done by the call to StreamExecutionEnvironment.getExecutionEnvironment(); which also

Re: Question about concurrent checkpoints

2017-09-21 Thread Nico Kruber
Hi Narendra, according to [1], even with asynchronous state snapshots (see [2]), a checkpoint is only complete after all sinks have received the barriers and all (asynchronous) snapshots have been processed. Since, if the number of concurrent checkpoints is 0, no checkpoint barriers will be

Question about concurrent checkpoints

2017-09-21 Thread Narendra Joshi
Hi, How are concurrent snapshots taken for an operator? Let's say an operator receives barriers for a checkpoint from all of its inputs. It triggers the checkpoint. Now, the checkpoint starts getting saved asynchronously. Before the checkpoint is acknowledged, the operator receives all barriers

Re: Recommended way to schedule Flink jobs periodically on EMR

2017-09-21 Thread Konstantin Gregor
Hi, we start our Flink Jobs on EMR with Lambda functions. Lambda functions can have various triggers, among many others they can be triggered by some CloudWatch rule. So you could create a rule based on your desires that triggers a Lambda function, which will in turn spin up an EMR cluster with

on Wikipedia Edit Stream example

2017-09-21 Thread Liu Haibin
Hi, I'm wondering why we don't need to run ./bin/start-local.sh for "Wikipedia Edit Stream" example but we need to do it for "Quickstart" example. I found that I can access http://localhost:8081 for "Quickstart" example but not for "Wikipedia Edit Stream". Where is "Wikipedia Edit Stream"