Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-26 Thread Aljoscha Krettek
You’re right, it’s currently not possible. I created a Jira issue to track the problem: https://issues.apache.org/jira/browse/BEAM-1812 It shouldn’t be to hard to add this since it boils down to forwarding some configuration settings. Best,

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-24 Thread Jins George
Currently /org.apache.beam.runners.flink.FlinkPipelineOptions/ does not have a way to configure externalized checkpoints. Is that something in the road map for FlinkRunner? Thanks, Jins George On 03/23/2017 10:27 AM, Aljoscha Krettek wrote: For this you would use externalised checkpoints:

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-22 Thread Jins George
Thanks Aljoscha for the clarification. Savepoints works fine in case of controlled stop and restart. In case of a failure( say the entire job failed due node crash or application software bug) is there a way to resume from the checkpoint on restarting the application ? Checkpoint location is

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-22 Thread Amit Sela
On Tue, Mar 21, 2017 at 11:59 PM Mingmin Xu wrote: > In SparkRunner, the default checkpoint storage is TmpCheckpointDirFactory. > Can it restore during job restart? --Not test the runner in streaming for > some time. > TmpCheckpointDirFactory simply points to a "default"

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-22 Thread Aljoscha Krettek
As I mentioned before, when running a Flink Job and simply cancelling it all state about that job is discarded (with some exceptions, such as externalised checkpoints). If you want the state of a Job to survive a cancellation you have to perform a savepoint [1] and then when restarting the Job

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Amit Sela
On Tue, Mar 21, 2017 at 7:26 PM Mingmin Xu wrote: > Move discuss to dev-list > > Savepoint in Flink, also checkpoint in Spark, should be good enough to > handle this case. > > When people don't enable these features, for example only need at-most-once > The Spark runner

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Mingmin Xu
Move discuss to dev-list Savepoint in Flink, also checkpoint in Spark, should be good enough to handle this case. When people don't enable these features, for example only need at-most-once semantic, each unbounded IO should try its best to restore from last offset, although CheckpointMark is