Do you have to separate the snapshot from the "normal" update flow.

I've used a compacting kafka topic as the source of truth to a solr
database and fed the topic both with real time updates and "snapshots" from
a hive job. This worked very well. The nice point is that there is a
seamless transition between loading an initial state and after that
continuing with incremental updates. Thus no need to actually tell whether
you are at the end of the stream or not. (Just our normal monitor the "lag"
problem...)



2015-02-18 19:18 GMT+01:00 Will Funnell <w.f.funn...@gmail.com>:

> We are currently using Kafka 0.8.1.1 with log compaction in order to
> provide streams of messages to our clients.
>
> As well as constantly consuming the stream, one of our use cases is to
> provide a snapshot, meaning the user will receive a copy of every message
> at least once.
>
> Each one of these messages represents an item of content in our system.
>
>
> The problem comes when determining if the client has actually reached the
> end of the topic.
>
> The standard Kafka way of dealing with this seems to be by using a
> ConsumerTimeoutException, but we are frequently getting this error when the
> end of the topic has not been reached or even it may take a long time
> before a timeout naturally occurs.
>
>
> On first glance it would seem possible to do a lookup for the max offset
> for each partition when you begin consuming, stopping when this position it
> reached.
>
> But log compaction means that if an update to a piece of content arrives
> with the same message key, then this will be written to the end so the
> snapshot will be incomplete.
>
>
> Another thought is to make use of the cleaner point. Currently Kafka writes
> out to a "cleaner-offset-checkpoint" file in each data directory which is
> written to after log compaction completes.
>
> If the consumer was able to access the cleaner-offset-checkpoint you would
> be able to consume up to this point, check the point was still the same,
> and compaction had not yet occurred, and therefore determine you had
> receive everything at least once. (Assuming there was no race condition
> between compaction and writing to the file)
>
>
> Has anybody got any thoughts?
>
> Will
>

Reply via email to