That should work, though it sounds like you may be interested in :
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

If you can include the 'batch_id' inside your messages, and define custom
control messages with a control topic, then you would not need one topic
per batch, and you would be very close to the essence of the above proposal.

Thanks,
Apurva

On Fri, Dec 2, 2016 at 5:02 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:

> Heya,
>
> I need to send a group of messages, which are all related, and then process
> those messages, only when all of them have arrived.
>
> Here is how I'm planning to do this. Is this the right way, and can any
> improvements be made to this?
>
> 1) Send a message to a topic called batch_start, with a batch id (which
> will be a UUID)
>
> 2) Post the messages to a topic called batch_msgs_<batch_id>. Here batch_id
> will be the batch id sent in batch_start.
>
> The number of messages sent will be recorded by the producer.
>
> 3) Send a message to batch_end with the batch id and the number of sent
> messages.
>
> 4) On the consumer side, using Kafka Streaming, I would listen to
> batch_end.
>
> 5) When the message there arrives, I will start another instance of Kafka
> Streaming, which will process the messages in batch_msgs_<batch_id>
>
> 6) Perhaps to be extra safe, whenever batch_end arrives, I will start a
> throwaway consumer which will just count the number of messages in
> batch_msgs_<batch_id>. If these don't match the # of messages specified in
> the batch_end message, then it will assume that the batch hasn't yet
> finished arriving, and it will wait for some time before retrying. Once the
> correct # of messages have arrived, THEN it will trigger step 5 above.
>
> Will the above method work, or should I make any changes to it?
>
> Is step 6 necessary?
>

Reply via email to