Hi Olivier,

> Are you talking about the $checkpoint spout or MySpout (with the offset)? 

I was referring  to the user spout (MySpout in this case).

> Does it mean all the emitted tuples are acked only when the $checkpoint.txId 
> event is ack (and so $checkpoint.txId acts as a barrier)? which means when 
> tuples are acked (in MySpout), I am sure a state has been checkpoint. 

Yes that is right. So when tuples are ack-ed in MySpout you can move your 
offsets.

>Does it mean my checkpoint interval must be lower than the tuple timeout 
>(TOPOLGY_MESSAGE_TIMEOUT)? 

Right, if you change the defaults it should be lower than message timeout. The 
default checkpoint interval is 1s and message timeout is 30s.

Thanks,
Arun


From:  Olivier Mallassi
Reply-To:  "[email protected]"
Date:  Wednesday, May 18, 2016 at 12:57 AM
To:  "[email protected]"
Subject:  Re: State Checkpointing & spout state

Hi Arun, 

Thank you for your answer. 
I may be able to deal with "at least once" with idempotency and a stateful bolt 
(need to look at  in details yet) but being able to checkpoint the state of the 
spout would be really helpful  ;) 

anyway. I may have missed something in the doc but I just need to clarify your 
phrase "It checkpoints the states of all the bolts and once that’s successful, 
the tuples emitted by the spout are acked"

Are you talking about the $checkpoint spout or MySpout (with the offset)? 
Does it mean all the emitted tuples are acked only when the $checkpoint.txId 
event is ack (and so $checkpoint.txId acts as a barrier)? which means when 
tuples are acked (in MySpout), I am sure a state has been checkpointed. 
Does it mean my checkpoint interval must be lower than the tuple timeout 
(TOPOLGY_MESSAGE_TIMEOUT)? 

Many thanks for your help. 

Olivier. 

On Tue, May 17, 2016 at 2:12 PM, Arun Mahadevan <[email protected]> wrote:
Hi Oliver,

The state checkpointing currently does not checkpoint the state of the spout. 
It checkpoints the states of all the bolts and once that’s successful, the 
tuples emitted by the spout are acked. So currently it provides at-least once 
guarantee. 

In the ack method of the spout, you can update your offsets.

In future we will extend state checkpointing to checkpoint the state of the 
spout.

Thanks,
Arun


From: Olivier Mallassi
Reply-To: "[email protected]"
Date: Tuesday, May 17, 2016 at 5:29 PM
To: "[email protected]"
Subject: State Checkpointing & spout state

Hello 

I would need to use the state checkpointing for recovery (btw, very useful 
feature). I am facing an issue regarding how to checkpoint the state of the my 
spout (no the checkpoint spout) as part of the "transaction". 

My Spout is reading from kafka (or equivalent) and so keeps an offset of the 
last read events. 
It keeps track of 
- the last read offset
- the emitted and acknowledged events (with their associated offset). 
- the emitted and unack events (so they can be replayed)

With state checkpointing, the bolt states will be kept but how can I keep the 
state of the source ? how can I ensure the spout replays events from the offset 
that match the checkpoint (or txid)?
Is there any guarantees in storm that the acks are received in the order they 
are sent? 

Cheers. 

olivier. 


Reply via email to