Hi Edvard, interesting problem – I’ve had similar problems with high fan out 
use cases, but only for demo applications where I’m more interested in scale 
than order – e.g. have a look at this list of blogs, examples include Anomalia 
Machina for Kafka+Cassandra, and Kongo, Kafka+IoT. 
https://www.linkedin.com/pulse/complete-guide-apache-kafka-developers-everything-i-know-paul-brebner/

Regards, Paul

From: Edvard Fagerholm <edvard.fagerh...@gmail.com>
Date: Tuesday, 30 May 2023 at 6:55 am
To: users@kafka.apache.org <users@kafka.apache.org>
Subject: Patterns for generating ordered streams
NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Hi there,

Kafka is new to us, so we don't have operational experience from it. I'm
building a system that would be broadcasting events to groups of clients.
Kafka consumers handle the broadcasting. In other words,

kafka topic --- consumers --- clients

We need to maintain the order of the events as they come through Kafka and
clients need to be able to retrieve from a DB any events posted while they
were offline. This means that the consumer also writes each event to a DB.
The number of clients is in the hundreds of millions.

What I'm trying to understand here is what I can use to sort the events in
the DB. A natural candidate would be to include the Kafka partition offset
as a range key in the DB and when doing queries use it for sorting as it
would guarantee the same order. Timestamps from the consumer I would not
use, since they aren't idempotent and a consumer could fail before acking
its last batch.

A problem with the partition offset that I haven't been able to find an
answer to in the docs is operational. If we would ever like to move a topic
to a new cluster, then the partitions of the topic in the new cluster need
to start their offsets where they ended in the old cluster. Is it possible
to set initial offsets for each partition when creating a topic?

Have other users solved this in other ways? Another idea I've had is to
just keep a sequence number in memory on the consumer and then store offset
-> sequence number mappings in the DB. When committing the consumer would
delete the maps from older batches. This allows a new consumer to
synchronize with any previous consumer and maintain idempotency.

I'm assuming this is a topic that has concerned many Kafka users. What we
can't do is maintain a sequence number in a DB and update that on each
event. The reason is that we use Cassandra for ingesting the events and it
does not support a counter that can be incremented and read without
transactions, so it would be very expensive on every insert with the load
we have.

Best,
Edvard

Reply via email to