Streams does use one changelog topic per store (not just a single global
changelog topic per application). Thus, the number of partitions can be
different for different stores/changelog topics within one application.

About partitions assignment: It depends a little bit on the structure of
your program (ie, the DAG structure). But in general, partitions of
different topics are co-located. Assume you have 2 input topic T1 and T2
with 3 partitions each and 2 application instances: Instance 1 would get
T1-0 and T2-0 assigned and instance 2 would get T1-1 and T2-1 assigned.
The remaining partitions T1-2 and T2-2 might be on either instance (so
either both on instance 1 or both on instance 2).

For this case, the changelog topic would have 3 partitions (same as the
input topics).

About modifying the input topics: This is not allowed and will break
your application. After a changelog topic got created, it will not be
modified anymore. Thus, if you change the number of input topic
partitions, it does not match the number of changelog topic partitions
and Streams will raise an exception. Using the reset tool is mandatory
for this case to fix it.

This blog post gives more details:
https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/


-Matthias


On 7/14/17 5:53 AM, Eno Thereska wrote:
> Hi Debasish,
> 
> Your intuition about the first part is correct. Kafka Streams automatically 
> assigns a partition of a topic to 
> a task in an instance. It will never be the case that the same partition is 
> assigned to two tasks.
> 
> About the merging or changing of partitions part, it would help if we know 
> more about what you 
> are trying to do. For example, if behind the scenes you add or remove 
> partitions that would not work
> well with Kafka Streams. However, if you use the Kafka Streams itself to 
> create new topics (e.g., 
> by merging two topics into one, or vice versa by taking one topic and 
> splitting it into more topics), then
> that would work fine.
> 
> Eno
> 
>> On 13 Jul 2017, at 23:49, Debasish Ghosh <ghosh.debas...@gmail.com> wrote:
>>
>> Hi -
>>
>> I have a question which is mostly to clarify some conceptions regarding
>> state management and restore functionality using Kafka Streams ..
>>
>> When I have multiple instances of the same application running (same
>> application id for each of the instances), are the following assumptions
>> correct ?
>>
>>   1. each instance has a separate state store (local)
>>   2. all instances are backed up by a *single* changelog topic
>>
>> Now the question arises, how does restore work in the above case when we
>> have 1 changelog topic backing up multiple state stores ?
>>
>> Each instance of the application ingests data from specific partitions of
>> the topic. And there can be multiple topics too. e.g. if we have m topics
>> with n partitions in each, and p instances of the application, then all the
>> (m x n) partitions are distributed across the p instances of the
>> application. Is this true ?
>>
>> If so, then does the changelog topic also has (m x n) partitions, so that
>> Kafka knows which state to restore in which store in case of a restore
>> operation ?
>>
>> And finally, if we decide to merge topics / partitions in between without
>> complete reset of the application, will (a) it work ? and (b) the changelog
>> topic gets updated accordingly and (c) is this recommended ?
>>
>> regards.
>>
>> -- 
>> Debasish Ghosh
>> http://manning.com/ghosh2
>> http://manning.com/ghosh
>>
>> Twttr: @debasishg
>> Blog: http://debasishg.blogspot.com
>> Code: http://github.com/debasishg
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to