Re: how to deploy new code with checkpointing

2016-04-12 Thread Cody Koeninger
- Checkpointing alone isn't enough to get exactly-once semantics.
Events will be replayed in case of failure.  You must have idempotent
output operations.

- Another way to handle upgrades is to just start a second app with
the new code, then stop the old one once everything's caught up.

On Tue, Apr 12, 2016 at 1:15 AM, Soumitra Siddharth Johri
 wrote:
> I think before doing a code update you would like to gracefully shutdown
> your streaming job and checkpoint the processed offsets ( and any state that
> you maintain ) in database or Hdfs.
> When you start the job up it should read this checkpoint file , build the
> necessary state and begin processing from the last offset processed.
>
> Another approach would be to checkpoint the processed offsets in the
> streaming job whenever you read from Kafka . Then before reading the next
> batch of offsets instead of relying on spark checkpoint for offsets, read
> from the last processed offset that you saved.
>
> Regards
> Soumitra
>
> On Apr 11, 2016, at 8:31 PM, Siva Gudavalli  wrote:
>
> Okie. That makes sense.
>
> Any recommendations on how to manage changes to my spark streaming app and
> achieving fault tolerance at the same time
>
> On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu
>  wrote:
>>
>> You cannot. Streaming doesn't support it because code changes will break
>> Java serialization.
>>
>> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli 
>> wrote:
>>>
>>> hello,
>>>
>>> i am writing a spark streaming application to read data from kafka. I am
>>> using no receiver approach and enabled checkpointing to make sure I am not
>>> reading messages again in case of failure. (exactly once semantics)
>>>
>>> i have a quick question how checkpointing needs to be configured to
>>> handle code changes in my spark streaming app.
>>>
>>> can you please suggest. hope the question makes sense.
>>>
>>> thank you
>>>
>>> regards
>>> shiv
>>
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: how to deploy new code with checkpointing

2016-04-12 Thread Soumitra Siddharth Johri
I think before doing a code update you would like to gracefully shutdown your 
streaming job and checkpoint the processed offsets ( and any state that you 
maintain ) in database or Hdfs.
When you start the job up it should read this checkpoint file , build the 
necessary state and begin processing from the last offset processed.

Another approach would be to checkpoint the processed offsets in the streaming 
job whenever you read from Kafka . Then before reading the next batch of 
offsets instead of relying on spark checkpoint for offsets, read from the last 
processed offset that you saved.

Regards
Soumitra

> On Apr 11, 2016, at 8:31 PM, Siva Gudavalli  wrote:
> 
> Okie. That makes sense. 
> 
> Any recommendations on how to manage changes to my spark streaming app and 
> achieving fault tolerance at the same time
> 
>> On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu 
>>  wrote:
>> You cannot. Streaming doesn't support it because code changes will break 
>> Java serialization.
>> 
>>> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli  wrote:
>>> hello,
>>> 
>>> i am writing a spark streaming application to read data from kafka. I am 
>>> using no receiver approach and enabled checkpointing to make sure I am not 
>>> reading messages again in case of failure. (exactly once semantics) 
>>> 
>>> i have a quick question how checkpointing needs to be configured to handle 
>>> code changes in my spark streaming app. 
>>> 
>>> can you please suggest. hope the question makes sense.
>>> 
>>> thank you 
>>> 
>>> regards
>>> shiv
> 


Re: how to deploy new code with checkpointing

2016-04-11 Thread Siva Gudavalli
Okie. That makes sense.

Any recommendations on how to manage changes to my spark streaming app and
achieving fault tolerance at the same time

On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu  wrote:

> You cannot. Streaming doesn't support it because code changes will break
> Java serialization.
>
> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli 
> wrote:
>
>> hello,
>>
>> i am writing a spark streaming application to read data from kafka. I am
>> using no receiver approach and enabled checkpointing to make sure I am not
>> reading messages again in case of failure. (exactly once semantics)
>>
>> i have a quick question how checkpointing needs to be configured to
>> handle code changes in my spark streaming app.
>>
>> can you please suggest. hope the question makes sense.
>>
>> thank you
>>
>> regards
>> shiv
>>
>
>


Re: how to deploy new code with checkpointing

2016-04-11 Thread Shixiong(Ryan) Zhu
You cannot. Streaming doesn't support it because code changes will break
Java serialization.

On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli  wrote:

> hello,
>
> i am writing a spark streaming application to read data from kafka. I am
> using no receiver approach and enabled checkpointing to make sure I am not
> reading messages again in case of failure. (exactly once semantics)
>
> i have a quick question how checkpointing needs to be configured to handle
> code changes in my spark streaming app.
>
> can you please suggest. hope the question makes sense.
>
> thank you
>
> regards
> shiv
>


how to deploy new code with checkpointing

2016-04-11 Thread Siva Gudavalli
hello,

i am writing a spark streaming application to read data from kafka. I am
using no receiver approach and enabled checkpointing to make sure I am not
reading messages again in case of failure. (exactly once semantics)

i have a quick question how checkpointing needs to be configured to handle
code changes in my spark streaming app.

can you please suggest. hope the question makes sense.

thank you

regards
shiv