Multiple topics is the model I would recommend for what you have described.
LinkedIn has an environment where we have a wide mix, in a lot of different
clusters. We have some topics that have one producer and one consumer
(queuing). We have some topics that are multi-producer (tracking and
metrics, mostly). Some of those are multi-consumer (tracking), and some are
mostly single consumer (metrics). Besides all of this, we have a couple
wildcard consumers that read everything (our audit system, and mirror
makers).

In your case, the rules engine sounds like a similar consumer case as our
audit consumer. I would not make the determination as to how many topics
you need based on that consumer because of that. Since the majority of what
you're describing is consumers who are interested in discrete data sets, go
with breaking out the topics based on that (all other things being equal).
While Gwen is absolutely right about her guidelines, consuming and throwing
away most of the data is a cardinal sin and should be avoided. Multi-topic
consumers are much less of a problem to deal with. Personally, I wouldn't
bother combining the messages into a separate topic for the rules engine -
I would just consume all the topics.

You mentioned message ordering, and that can present an issue. Now, you'd
likely have this problem regardless of how many topics you use, as ordering
is only guaranteed in a single partition. So you'd either have to have one
partition, or you would have to use some sort of partitioning scheme on the
messages that means hard ordering of all the messages matters less.
Obviously, when you have multiple topics it's the same as having multiple
partitions. You need to decide how important ordering within Kafka is to
your application, and if it can be handled separately inside of the
application.

-Todd



On Thu, Oct 8, 2015 at 8:50 AM, Mark Drago <markdr...@gmail.com> wrote:

> Gwen,
>
> Thanks for your reply.  I understand all of the points you've made.  I
> think the challenge for us is that we have some consumers that are
> interested in messages of one type, but we also have a rules engine that is
> checking for events of many types and acting on them.
>
> If we put discrete event types on individual topics:
>   * Our rules engine would have to monitor many of these topics (10-20)
>   * Other consumers would only see messages they care about
>
> If we put all event types on one topic:
>   * Our rules engine only has to monitor one topic
>   * Other consumers would parse and then discard the majority of the
> messages that they see
>
> Perhaps a better approach would be to have different topics for the
> different use cases?  This would be similar to an approach that merges
> smaller topics together as needed.  So, each event type would be on its own
> topic but then we would put a subset of those messages on another topic
> that is destined for the rules engine.  The consumers that only care about
> one message type would listen on dedicated topics and the rules engine
> would just monitor one topic for all of the events that it cares about.  We
> would need to have something moving/creating messages on the rules engine
> topic.  We may also run in to another set of problems as the ordering of
> messages of different types no longer exists as they're coming from
> separate topics.
>
> I'm curious to hear if anyone else has been in a similar situation and had
> to make a judgement call about the best approach to take.
>
> Thanks,
> Mark.
>
> I usually approach this questions by looking at possible consumers. You
> > usually want each consumer to read from relatively few topics, use most
> > of the messages it receives and have fairly cohesive logic for using
> these
> > messages. Signs that things went wrong with too few topics:
> > * Consumers that throw away 90% of the messages on topics they read
> > * Consumers with gigantic switch statements for handling all the
> different
> > message types they get Signs that you have too many topics:
> > * Every consumer needs to read messages from 20 different topics in order
> > to construct the objects it actually uses If you ever did data modeling
> > for a datawarehouse, this will look a bit
> > familiar :) Gwen
> > On Tue, Oct 6, 2015 at 4:46 PM, Mark Drago <markdr...@gmail.com> wrote:
> >
> > Hello,
> > >
> > > At my organization we are already using kafka in a few areas, but we're
> > > looking to expand our use and we're struggling with how best to
> > distribute
> > > our events on to topics.
> > >
> > > We have on the order of 30 different kinds of events that we'd like to
> > > distribute via kafka. We have one or two consumers that have a need to
> > > consume many of these types of events (~20 out of the 30) and we have
> > other
> > > consumers that are only interested in one type of event.
> > >
> > > We're trying to decide between a model where we have one topic
> containing
> > > many kinds of events or a model where we have many topics each
> containing
> > > one type of event. We have also thought about mixed models where we
> have
> > > one large topic that we later break down in to smaller ones or we have
> > many
> > > small topics that we later coming in to a large topic.
> > >
> > > I'm curious to hear about best practices and past experiences from the
> > > members of this group. What is the general best practice for reusing
> > > topics or creating new ones? What has worked well in the past? What
> > > should we be considering while making this decision?
> > >
> > > Thanks in advance!
> > > Mark.
>

Reply via email to