Re: how to deploy new code with checkpointing
- Checkpointing alone isn't enough to get exactly-once semantics. Events will be replayed in case of failure. You must have idempotent output operations. - Another way to handle upgrades is to just start a second app with the new code, then stop the old one once everything's caught up. On Tue, Apr 12, 2016 at 1:15 AM, Soumitra Siddharth Johriwrote: > I think before doing a code update you would like to gracefully shutdown > your streaming job and checkpoint the processed offsets ( and any state that > you maintain ) in database or Hdfs. > When you start the job up it should read this checkpoint file , build the > necessary state and begin processing from the last offset processed. > > Another approach would be to checkpoint the processed offsets in the > streaming job whenever you read from Kafka . Then before reading the next > batch of offsets instead of relying on spark checkpoint for offsets, read > from the last processed offset that you saved. > > Regards > Soumitra > > On Apr 11, 2016, at 8:31 PM, Siva Gudavalli wrote: > > Okie. That makes sense. > > Any recommendations on how to manage changes to my spark streaming app and > achieving fault tolerance at the same time > > On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu > wrote: >> >> You cannot. Streaming doesn't support it because code changes will break >> Java serialization. >> >> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli >> wrote: >>> >>> hello, >>> >>> i am writing a spark streaming application to read data from kafka. I am >>> using no receiver approach and enabled checkpointing to make sure I am not >>> reading messages again in case of failure. (exactly once semantics) >>> >>> i have a quick question how checkpointing needs to be configured to >>> handle code changes in my spark streaming app. >>> >>> can you please suggest. hope the question makes sense. >>> >>> thank you >>> >>> regards >>> shiv >> >> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: how to deploy new code with checkpointing
I think before doing a code update you would like to gracefully shutdown your streaming job and checkpoint the processed offsets ( and any state that you maintain ) in database or Hdfs. When you start the job up it should read this checkpoint file , build the necessary state and begin processing from the last offset processed. Another approach would be to checkpoint the processed offsets in the streaming job whenever you read from Kafka . Then before reading the next batch of offsets instead of relying on spark checkpoint for offsets, read from the last processed offset that you saved. Regards Soumitra > On Apr 11, 2016, at 8:31 PM, Siva Gudavalliwrote: > > Okie. That makes sense. > > Any recommendations on how to manage changes to my spark streaming app and > achieving fault tolerance at the same time > >> On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu >> wrote: >> You cannot. Streaming doesn't support it because code changes will break >> Java serialization. >> >>> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli wrote: >>> hello, >>> >>> i am writing a spark streaming application to read data from kafka. I am >>> using no receiver approach and enabled checkpointing to make sure I am not >>> reading messages again in case of failure. (exactly once semantics) >>> >>> i have a quick question how checkpointing needs to be configured to handle >>> code changes in my spark streaming app. >>> >>> can you please suggest. hope the question makes sense. >>> >>> thank you >>> >>> regards >>> shiv >
Re: how to deploy new code with checkpointing
Okie. That makes sense. Any recommendations on how to manage changes to my spark streaming app and achieving fault tolerance at the same time On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhuwrote: > You cannot. Streaming doesn't support it because code changes will break > Java serialization. > > On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli > wrote: > >> hello, >> >> i am writing a spark streaming application to read data from kafka. I am >> using no receiver approach and enabled checkpointing to make sure I am not >> reading messages again in case of failure. (exactly once semantics) >> >> i have a quick question how checkpointing needs to be configured to >> handle code changes in my spark streaming app. >> >> can you please suggest. hope the question makes sense. >> >> thank you >> >> regards >> shiv >> > >
Re: how to deploy new code with checkpointing
You cannot. Streaming doesn't support it because code changes will break Java serialization. On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalliwrote: > hello, > > i am writing a spark streaming application to read data from kafka. I am > using no receiver approach and enabled checkpointing to make sure I am not > reading messages again in case of failure. (exactly once semantics) > > i have a quick question how checkpointing needs to be configured to handle > code changes in my spark streaming app. > > can you please suggest. hope the question makes sense. > > thank you > > regards > shiv >
how to deploy new code with checkpointing
hello, i am writing a spark streaming application to read data from kafka. I am using no receiver approach and enabled checkpointing to make sure I am not reading messages again in case of failure. (exactly once semantics) i have a quick question how checkpointing needs to be configured to handle code changes in my spark streaming app. can you please suggest. hope the question makes sense. thank you regards shiv