Heya,

I need to send a group of messages, which are all related, and then process
those messages, only when all of them have arrived.

Here is how I'm planning to do this. Is this the right way, and can any
improvements be made to this?

1) Send a message to a topic called batch_start, with a batch id (which
will be a UUID)

2) Post the messages to a topic called batch_msgs_<batch_id>. Here batch_id
will be the batch id sent in batch_start.

The number of messages sent will be recorded by the producer.

3) Send a message to batch_end with the batch id and the number of sent
messages.

4) On the consumer side, using Kafka Streaming, I would listen to batch_end.

5) When the message there arrives, I will start another instance of Kafka
Streaming, which will process the messages in batch_msgs_<batch_id>

6) Perhaps to be extra safe, whenever batch_end arrives, I will start a
throwaway consumer which will just count the number of messages in
batch_msgs_<batch_id>. If these don't match the # of messages specified in
the batch_end message, then it will assume that the batch hasn't yet
finished arriving, and it will wait for some time before retrying. Once the
correct # of messages have arrived, THEN it will trigger step 5 above.

Will the above method work, or should I make any changes to it?

Is step 6 necessary?

Reply via email to