James,

You can reference confluent IO schema registry implementation.
http://docs.confluent.io/1.0/schema-registry/docs/index.html

It does similar thing as what you described. A REST front end that serves
data from a compacted topic and HA is also provided in the solution.

On Tue, 21 Jul 2015 at 09:25 James Cheng <jch...@tivo.com> wrote:

> Hi,
>
> I have a web service that serves up some data that it obtains from a kafka
> topic. When the process starts up, it wants to load the entire kafka topic
> into memory, and serve the data up from an in-memory hashtable. The data in
> the topic has primary keys and is log compacted, and so the total dataset
> will be small enough to fit in memory. My web service will only start
> serving up data when the entire topic is loaded. (And for that,
> https://issues.apache.org/jira/browse/KAFKA-1977 would be super useful).
>
> I am only storing this data in memory. In the event of process death or
> restart, my in-memory state is gone, and so I will always want to rebuild
> it by again consuming the topic from the earliest offset. I will never need
> to checkpoint my offsets.
>
> Also, I will have N instances of this application, each one needing to
> consume the entire topic. This is how I plan to do horizontal scaling of my
> web service.
>
> I would like to use the high level consumer, so that I don't need to
> manually discover which broker is the leader, and so that I don't have to
> handle leader rebalancing.
>
> A couple questions:
> 1) Does this use case make sense? Is this pattern used by anyone else? I
> like it because it makes my web service completely stateless.
> 2) In order to make each instance consume all partitions of the topic, I
> need each consumer group id to be unique to that process. So I was thinking
> of just using a UUID or something similar. What is the "cost" of creating a
> new consumer group id? If I am creating a new one every time I start my
> application, would I be cluttering up zookeeper or the __consumer_offsets
> topic? Note there will only every be N instances of my application running.
> Since I never will need to checkpoint my offsets, does that affect my
> question about "cluttering up" zookeeper/kafka? Are old consumer groups
> ever cleaned up out of zookeeper or the __consumer_offsets topic?
> 3) Are the stored offsets used for any other reason, aside from at startup
> of a new consumer? Are offsets used after rebalancing when partition
> leaders change due to broker failure? I know that offsets can be used for
> Burrow-like monitoring.
> 4) Since I don't need for support checkpointing, another option is to use
> the SimpleConsumer. The sample code at
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> looks fairly comprehensive. It handles discovery of the partition leader,
> and handles leader rebalancing. Are there any other situations that I
> should be aware of before relying on that sample code?
> 5) Will any of this change when the new consumer comes out? Will the
> SimpleConsumer still exist when the new consumer comes out?
>
> Thanks,
> -James
>
>

Reply via email to