Re: how to deploy new code with checkpointing

2016-04-12 Thread Cody Koeninger
- Checkpointing alone isn't enough to get exactly-once semantics. Events will be replayed in case of failure. You must have idempotent output operations. - Another way to handle upgrades is to just start a second app with the new code, then stop the old one once everything's caught up. On Tue,

Re: how to deploy new code with checkpointing

2016-04-12 Thread Soumitra Siddharth Johri
I think before doing a code update you would like to gracefully shutdown your streaming job and checkpoint the processed offsets ( and any state that you maintain ) in database or Hdfs. When you start the job up it should read this checkpoint file , build the necessary state and begin

Re: how to deploy new code with checkpointing

2016-04-11 Thread Siva Gudavalli
Okie. That makes sense. Any recommendations on how to manage changes to my spark streaming app and achieving fault tolerance at the same time On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu wrote: > You cannot. Streaming doesn't support it because code changes

Re: how to deploy new code with checkpointing

2016-04-11 Thread Shixiong(Ryan) Zhu
You cannot. Streaming doesn't support it because code changes will break Java serialization. On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli wrote: > hello, > > i am writing a spark streaming application to read data from kafka. I am > using no receiver approach and enabled

how to deploy new code with checkpointing

2016-04-11 Thread Siva Gudavalli
hello, i am writing a spark streaming application to read data from kafka. I am using no receiver approach and enabled checkpointing to make sure I am not reading messages again in case of failure. (exactly once semantics) i have a quick question how checkpointing needs to be configured to