Re: Question about OpaqueTridentKafkaSpout

Abhishek Bhattacharjee Mon, 05 May 2014 12:51:06 -0700

Are you sure that a batch can consist of tuples from different partitions ?
I am just asking I am not sure , if it can then your question seems to be
valid else it is not valid anymore :-)



On Fri, May 2, 2014 at 7:42 AM, Ashok Gupta <[email protected]>wrote:

>
> Hi,
>
>  I have theoretical question about the guarantees
> OpaqueKafkaTridentKafkaSpout provides. I would like to take an example to
> illustrate the question I have.
>
>  Suppose a batch with txId 10 has tuple t1, t2, t3, t4 and they
> respectively come from the kafka partition p1,p2,p3,p4. When this batch is
> played for the very first time it failed processing however the commit
> happen for tuples t3 in the database while it did not happen for the tuples
> t1,t2,t4. Since the batch failed, it is expected that the metadata in the
> zookeeper is not going to be updated i.e. it will not assume the offsets as
> committed for p1,p2,p3,p4. It is expected that the batch will be replayed,
> however, suppose before it gets replayed the kafka partition p3 goes down.
> What happens now? I understand that another batch with same transaction id
> containing t1, t2, t4 may be replayed, however since p3 is down, t3 won’t
> be replayed again. Since t3 is not replayed again, even if the batch
> succeeds on replay the offsets for the p3 don’t get updated in the
> zookeeper. That is all fine as long fault tolerance and opaque behavior is
> concerned.
>
>  My concern is more around what happens when partition p3 is back up again
> and the spout starts reading data from the last offset it committed
> successfully. Since from partition p3, tuple t3 is again going to be read
> and it is certainly going to be in a batch with some txId > 10 (say 19) it
> is going to be applied in the state again. This apparently violates the
> exactly once semantics.
>
>  Is the concern genuine or am I missing something?
> Regards
> --
> Ashok Gupta,
> (+1) 361-522-2172
> San Jose, CA
>



-- 
*Abhishek Bhattacharjee*
*Pune Institute of Computer Technology*

Re: Question about OpaqueTridentKafkaSpout

Reply via email to