None of these questions are naive, so no worries. Answer inline:

> During restore why does Kafka replay the whole topic / partition to recreate 
> the state in the local state store ? Isn't there any way to just have the 
> latest message as the current state ? Because that's what it is .. right ? 
> The last message in the topic / partition IS the latest state. May be I am 
> missing something obvious ?

Let's say Kafka streams is doing an aggregate, e.g., computing sum(). For each 
key, it will compute the new sum() as new records arrive and store the result 
in the changelog topic in Kafka as well as it keeps a copy on RocksDB locally. 
Now, after a failure, a fresh instance comes along with no local state in 
RocksDB. It's necessary to re-construct that state. You are right that only the 
latest value for a key is needed. That is accomplished since the changelog 
topic is a compacted topic, and Kafka will do the compaction and keep only the 
latest value for a key. So what you are saying is effectively happening.

Note that if state restoration ends up being too long for your application 
needs, consider using standby tasks

Hope this helps,

> regards.
> On Fri, Jul 14, 2017 at 6:23 PM, Eno Thereska < 
> <>> wrote:
> Hi Debasish,
> Your intuition about the first part is correct. Kafka Streams automatically 
> assigns a partition of a topic to
> a task in an instance. It will never be the case that the same partition is 
> assigned to two tasks.
> About the merging or changing of partitions part, it would help if we know 
> more about what you
> are trying to do. For example, if behind the scenes you add or remove 
> partitions that would not work
> well with Kafka Streams. However, if you use the Kafka Streams itself to 
> create new topics (e.g.,
> by merging two topics into one, or vice versa by taking one topic and 
> splitting it into more topics), then
> that would work fine.
> Eno
> > On 13 Jul 2017, at 23:49, Debasish Ghosh < 
> > <>> wrote:
> >
> > Hi -
> >
> > I have a question which is mostly to clarify some conceptions regarding
> > state management and restore functionality using Kafka Streams ..
> >
> > When I have multiple instances of the same application running (same
> > application id for each of the instances), are the following assumptions
> > correct ?
> >
> >   1. each instance has a separate state store (local)
> >   2. all instances are backed up by a *single* changelog topic
> >
> > Now the question arises, how does restore work in the above case when we
> > have 1 changelog topic backing up multiple state stores ?
> >
> > Each instance of the application ingests data from specific partitions of
> > the topic. And there can be multiple topics too. e.g. if we have m topics
> > with n partitions in each, and p instances of the application, then all the
> > (m x n) partitions are distributed across the p instances of the
> > application. Is this true ?
> >
> > If so, then does the changelog topic also has (m x n) partitions, so that
> > Kafka knows which state to restore in which store in case of a restore
> > operation ?
> >
> > And finally, if we decide to merge topics / partitions in between without
> > complete reset of the application, will (a) it work ? and (b) the changelog
> > topic gets updated accordingly and (c) is this recommended ?
> >
> > regards.
> >
> > --
> > Debasish Ghosh
> > <>
> > <>
> >
> > Twttr: @debasishg
> > Blog: <>
> > Code: <>
> -- 
> Debasish Ghosh
> <>
> <>
> Twttr: @debasishg
> Blog: <>
> Code: <>

Reply via email to