Hi Guillermo, 1). It will have two rows: {"hello" => 2} and {"world" => 1}.
2). That is correct. Note that changelog records the most recent values for each key, so if you do not delete the data, the new "hello" => 3 record would practically make the previous two "hello" => 1 and "hello" => 2 obsolete. 3). Changelog data is used for restoration only. If you delete the topic, but no failure has ever happened, then everything is OK; but if you ever have a failure, you cannot restore the state as the changelog data is not available anymore. 4). Any operations that have associated state stores will have changelog topic by default; for KTable joins it will also have changelog for its materialized stream data. Guozhang On Tue, Apr 19, 2016 at 10:16 AM, Guillermo Lammers Corral < guillermo.lammers.cor...@tecsisa.com> wrote: > Hello, > > I read in the docs that Kafka Streams stores the computed aggregations in a > local embedded key-value store (RocksDB by default), i.e., Kafka Streams > provides so-called state stores. I'm wondering about the relationship > between each state store and its replicated changelog Kafka topic. > > If we use the WorkCountDemo example I would like to know what are the > sequence of events to understand both concepts when we execute something > like that: > > I send to topic the words: "hello", "world", "hello" > > Changelog topic messages: > > "hello" => 1 > "world" => 1 > "hello" => 2 > > It's ok. > > Q1) What is the status of RocksDB at this moment? > > Q2) If I delete all data in the changelog Kafka topic and send a new > "hello", I can see in the changelog topic: > > "hello" => 3 > > What is happening? CountByKey are counting using the RocksDB data stored > prior to updating the changelog again? > > Q3) Could someone explain me in depth the following process step by step? > What happen if the changelog Kafka topic was deleted as I did in the > question above? > > "If tasks run on a machine that fails and are restarted on another machine, > Kafka Streams guarantees to restore their associated state stores to the > content before the failure by replaying the corresponding changelog topics > prior to resuming the processing on the newly started tasks." > > Q4) What are the differences with operations without changelog Kafka topic > associated like joins between two KTables when machine fails occurs and we > need fault tolerance policy? > > Thanks! > -- -- Guozhang