Hey Tom,

Thanks for your help :)

In terms of the number of topics, there would be 1 topic per service. Each service would be running multiple instances, but they would be in the same consumer group consuming the same topic.

I am expecting around 30ish microservices at the moment, so around 30 topics. What's the upper limit on the number of topics?

In terms of the consumers, I plan to use kafka offsets and I also plan to store the offsets within kafka. Since you mentioned that I cannot track when the topic was last read from, will the following work?

- Set kafka to keep messages after 30 days.
- Have a job that reads from the offset topic in kafka and find the offset for the topic. - If the offset cannot be found in the topic it is read from, it was probably last read from 30 days ago, so the topic can be deleted.

Cheers,
Francis

On 6/09/2016 8:47 PM, Tom Crayford wrote:
inline

On Mon, Sep 5, 2016 at 11:58 PM, F21 <f21.gro...@gmail.com> wrote:

Hi Tom,

Thank you so much for your response. I had a feeling that approach would
run into scalability problems, so thank you for confirming that.

Another approach would be to have each service request a subscription from
the event store. The event store then creates a unique kafka topic for each
service. If multiple instances of a service requests a subscription, the
event store should only create the topic once and return the name of the
topic to the service.

How many topics would you expect to have in this approach? Kafka has
similar limits on number of topics as it does on partitions (in fact
partitions drives the topics, since all topics must have at least one
partition).


A reader/writer would read from HBase and push new messages into each
topic.

In this case, I would set my topics to retain message for, say, 5 days in
the event that a service goes down and we need to bring it back up.

The event store would also query kafka to see which topics have not been
read from for say 30 days and delete them. This would be for cases where a
service is decommissioned. Does kafka provide a way to check when the topic
was last read from?

Not last read from, no. You can track the last produced message by looking
at the latest message and it's timestamp (timestamps were added in 0.10).
I'd recommend tracking reads somewhere else, but it may be somewhat
difficult. You could also potentially use consumer offsets for this - if
your consumer is storing offsets in Kafka anyway.

Thanks

Tom Crayford
Heroku Kafka


Does this sound like a saner way?

Cheers,
Francis


On 5/09/2016 11:00 PM, Tom Crayford wrote:

inline

On Mon, Sep 5, 2016 at 12:00 AM, F21 <f21.gro...@gmail.com> wrote:

Hi all,
I am currently looking at using Kafka as a "message bus" for an event
store. I plan to have all my events written into HBase for permanent
storage and then have a reader/writer that reads from HBase to push them
into kafka.

In terms of kafka, I plan to set it to keep all messages indefinitely.
That way, if any consumers need to rebuild their views or if new
consumers
are created, they can just read from the stream to rebuild the views.

Kafka isn't designed at all for permanent message storage, except for
compacted topics. I suggest you rethink this, unless compacted topics work
for you (Kafka is not designed to keep unbounded amounts of data for
unbounded amounts of time, simply to provide messaging and replay over
short, bounded windows).


I plan to use domain-driven design and will use the concept of aggregates
in the system. An example of an aggregate might be a customer. All events
for a given aggregate needs to be delivered in order. In the case of
kafka,
I would need to over partition the system by a lot, as any changes in the
number of partitions could result in messages that were bound for a given
partition being pushed into a newly created partition. Are there any
issues
if I create a new partition every time an aggregate is created? In a
system
with a large amount of aggregates, this will result in millions or
hundreds
of millions of partitions. Will this cause performance issues?

Yes.
Kafka is designed to support hundreds to thousands of partitions per
machine, not millions (and there is an upper bound per cluster which is
well below one million). I suggest you rethink this and likely use a
standard "hash based partitioning" scheme.


Cheers,
Francis




Reply via email to