We would consider one of 2 or 3 flows to ensure an "exactly once" process from 
an input kafka topic to a database storing results (using kafka consumer, but 
also evaluated kafka streams and details at the end) and wanted to gather your 
input on them:
(for simplicity let's assume that any exception exits the process except if the 
exception comes out of step 5)
The outlined flows are executed in a loop.

First flow/solution:
1. read from kafka
2. start transaction in db
3. update target tables
4. commit transaction
5. commit offset to Kafka
6. if commit offset failed, attempt another transaction to revert the previous 
one in db. (compensate)

Solution 2 - offsets persisted in db in the same transaction, consumer reads 
from explicit offsets at init
If it is possible for the consumer to configure its offsets before starting to 
consume, then this flow would be possible:
0. at consumer process boot, read the latest offsets for partitions from db and 
configs consumer to start from those.
1. read from kafka (first read, from explicit offsets, the next polls just 
2. start transaction in db
3. update target tables
3'. update an "offsets" table, for consumer group and partition id
4. commit transaction (which includes offsets)

Solution 3 - If it would be possible to commit an explicit value of the offset 
to kafka for a (partition, consumer group), not just the current offset, but a 
previously saved one (at step 0), than another flow would be possible, with 4 
and 5 reversed:
4. commit offset to Kafka
5. commit transaction
6. if commit transaction failed, attempt to commit the old offset back to 
kafka. (compensate). Exit or rewind the consumer.

Solution 4 - use Kafka Streams configured with exactly once. This seems to 
imply that the aggregates (the results of the processing), currently stored in 
the db, would also need to be duplicated in kafka as output topics & local 
Rocksdb instances. Since the data volume even on the aggregates is significant, 
we are exploring solutions close to exactly once which would not imply the cost 
of doubly storing the result "tables".

Do you see any other possibility? What do you suggest for improving the options 
above, or what is your advice?
Please advise,
Thank you,
Geschäftsanschrift/Business address: METRO SYSTEMS GmbH, Metro-Straße 12, 40235 
Düsseldorf, Germany
Aufsichtsrat/Supervisory Board: Heiko Hutmacher (Vorsitzender/ Chairman)
Geschäftsführung/Management Board: Dr. Dirk Toepfer (Vorsitzender/CEO), Wim van 
Sitz Düsseldorf, Amtsgericht Düsseldorf, HRB 18232/Registered Office 
Düsseldorf, Commercial Register of the Düsseldorf Local Court, HRB 18232

Betreffend Mails von *@metrosystems.net
Die in dieser E-Mail enthaltenen Nachrichten und Anhänge sind ausschließlich 
für den bezeichneten Adressaten bestimmt. Sie können rechtlich geschützte, 
vertrauliche Informationen enthalten. Falls Sie nicht der bezeichnete Empfänger 
oder zum Empfang dieser E-Mail nicht berechtigt sind, ist die Verwendung, 
Vervielfältigung oder Weitergabe der Nachrichten und Anhänge untersagt. Falls 
Sie diese E-Mail irrtümlich erhalten haben, informieren Sie bitte unverzüglich 
den Absender und vernichten Sie die E-Mail.

Regarding mails from *@metrosystems.net
This e-mail message and any attachment are intended exclusively for the named 
addressee. They may contain confidential information which may also be 
protected by professional secrecy. Unless you are the named addressee (or 
authorised to receive for the addressee) you may not copy or use this message 
or any attachment or disclose the contents to anyone else. If this e-mail was 
sent to you by mistake please notify the sender immediately and delete this 

Reply via email to