I think it can. That is where the coordinator comes in picture. Coordinator defines the parameters of a batch and emitters do the job of emitting the sub portions of batch.
On Mon, May 5, 2014 at 12:50 PM, Abhishek Bhattacharjee < [email protected]> wrote: > Are you sure that a batch can consist of tuples from different partitions ? > I am just asking I am not sure , if it can then your question seems to be > valid else it is not valid anymore :-) > > > On Fri, May 2, 2014 at 7:42 AM, Ashok Gupta <[email protected]>wrote: > >> >> Hi, >> >> I have theoretical question about the guarantees >> OpaqueKafkaTridentKafkaSpout provides. I would like to take an example to >> illustrate the question I have. >> >> Suppose a batch with txId 10 has tuple t1, t2, t3, t4 and they >> respectively come from the kafka partition p1,p2,p3,p4. When this batch is >> played for the very first time it failed processing however the commit >> happen for tuples t3 in the database while it did not happen for the tuples >> t1,t2,t4. Since the batch failed, it is expected that the metadata in the >> zookeeper is not going to be updated i.e. it will not assume the offsets as >> committed for p1,p2,p3,p4. It is expected that the batch will be replayed, >> however, suppose before it gets replayed the kafka partition p3 goes down. >> What happens now? I understand that another batch with same transaction id >> containing t1, t2, t4 may be replayed, however since p3 is down, t3 won’t >> be replayed again. Since t3 is not replayed again, even if the batch >> succeeds on replay the offsets for the p3 don’t get updated in the >> zookeeper. That is all fine as long fault tolerance and opaque behavior is >> concerned. >> >> My concern is more around what happens when partition p3 is back up >> again and the spout starts reading data from the last offset it committed >> successfully. Since from partition p3, tuple t3 is again going to be read >> and it is certainly going to be in a batch with some txId > 10 (say 19) it >> is going to be applied in the state again. This apparently violates the >> exactly once semantics. >> >> Is the concern genuine or am I missing something? >> Regards >> -- >> Ashok Gupta, >> (+1) 361-522-2172 >> San Jose, CA >> > > > > -- > *Abhishek Bhattacharjee* > *Pune Institute of Computer Technology* > -- Ashok Gupta, (+1) 361-522-2172 San Jose, CA
