Re: Using kafka as a "message bus" for an event store

F21 Tue, 06 Sep 2016 03:56:23 -0700

Hey Tom,

Thanks for your help :)

In terms of the number of topics, there would be 1 topic per service.Each service would be running multiple instances, but they would be inthe same consumer group consuming the same topic.

I am expecting around 30ish microservices at the moment, so around 30topics. What's the upper limit on the number of topics?

In terms of the consumers, I plan to use kafka offsets and I also planto store the offsets within kafka. Since you mentioned that I cannottrack when the topic was last read from, will the following work?


- Set kafka to keep messages after 30 days.

- Have a job that reads from the offset topic in kafka and find theoffset for the topic.- If the offset cannot be found in the topic it is read from, it wasprobably last read from 30 days ago, so the topic can be deleted.


Cheers,
Francis

On 6/09/2016 8:47 PM, Tom Crayford wrote:

inline

On Mon, Sep 5, 2016 at 11:58 PM, F21 <f21.gro...@gmail.com> wrote:

Hi Tom,

Thank you so much for your response. I had a feeling that approach would
run into scalability problems, so thank you for confirming that.

Another approach would be to have each service request a subscription from
the event store. The event store then creates a unique kafka topic for each
service. If multiple instances of a service requests a subscription, the
event store should only create the topic once and return the name of the
topic to the service.

How many topics would you expect to have in this approach? Kafka has
similar limits on number of topics as it does on partitions (in fact
partitions drives the topics, since all topics must have at least one
partition).

A reader/writer would read from HBase and push new messages into each
topic.

In this case, I would set my topics to retain message for, say, 5 days in
the event that a service goes down and we need to bring it back up.

The event store would also query kafka to see which topics have not been
read from for say 30 days and delete them. This would be for cases where a
service is decommissioned. Does kafka provide a way to check when the topic
was last read from?

Not last read from, no. You can track the last produced message by looking
at the latest message and it's timestamp (timestamps were added in 0.10).
I'd recommend tracking reads somewhere else, but it may be somewhat
difficult. You could also potentially use consumer offsets for this - if
your consumer is storing offsets in Kafka anyway.

Thanks

Tom Crayford
Heroku Kafka

Does this sound like a saner way?

Cheers,
Francis


On 5/09/2016 11:00 PM, Tom Crayford wrote:

inline

On Mon, Sep 5, 2016 at 12:00 AM, F21 <f21.gro...@gmail.com> wrote:

Hi all,

I am currently looking at using Kafka as a "message bus" for an event
store. I plan to have all my events written into HBase for permanent
storage and then have a reader/writer that reads from HBase to push them
into kafka.

In terms of kafka, I plan to set it to keep all messages indefinitely.

That way, if any consumers need to rebuild their views or if new
consumers
are created, they can just read from the stream to rebuild the views.

Kafka isn't designed at all for permanent message storage, except for

compacted topics. I suggest you rethink this, unless compacted topics work
for you (Kafka is not designed to keep unbounded amounts of data for
unbounded amounts of time, simply to provide messaging and replay over
short, bounded windows).


I plan to use domain-driven design and will use the concept of aggregates

in the system. An example of an aggregate might be a customer. All events
for a given aggregate needs to be delivered in order. In the case of
kafka,
I would need to over partition the system by a lot, as any changes in the
number of partitions could result in messages that were bound for a given
partition being pushed into a newly created partition. Are there any
issues
if I create a new partition every time an aggregate is created? In a
system
with a large amount of aggregates, this will result in millions or
hundreds
of millions of partitions. Will this cause performance issues?

Yes.

Kafka is designed to support hundreds to thousands of partitions per
machine, not millions (and there is an upper bound per cluster which is
well below one million). I suggest you rethink this and likely use a
standard "hash based partitioning" scheme.


Cheers,

Francis

Re: Using kafka as a "message bus" for an event store

Reply via email to