Re: Kafka Streams State Store and RocksDB

Guozhang Wang Tue, 19 Apr 2016 18:07:15 -0700

Hi Guillermo,

1). It will have two rows: {"hello" => 2} and {"world" => 1}.

2). That is correct. Note that changelog records the most recent values for
each key, so if you do not delete the data, the new "hello" => 3 record
would practically make the previous two "hello" => 1 and "hello" => 2
obsolete.

3). Changelog data is used for restoration only. If you delete the topic,
but no failure has ever happened, then everything is OK; but if you ever
have a failure, you cannot restore the state as the changelog data is not
available anymore.

4). Any operations that have associated state stores will have changelog
topic by default; for KTable joins it will also have changelog for its
materialized stream data.

Guozhang

On Tue, Apr 19, 2016 at 10:16 AM, Guillermo Lammers Corral <
guillermo.lammers.cor...@tecsisa.com> wrote:

> Hello,
>
> I read in the docs that Kafka Streams stores the computed aggregations in a
> local embedded key-value store (RocksDB by default), i.e., Kafka Streams
> provides so-called state stores. I'm wondering about the relationship
> between each state store and its replicated changelog Kafka topic.
>
> If we use the WorkCountDemo example I would like to know what are the
> sequence of events to understand both concepts when we execute something
> like that:
>
> I send to topic the words: "hello", "world", "hello"
>
> Changelog topic messages:
>
> "hello" => 1
> "world" => 1
> "hello" => 2
>
> It's ok.
>
> Q1) What is the status of RocksDB at this moment?
>
> Q2) If I delete all data in the changelog Kafka topic and send a new
> "hello", I can see in the changelog topic:
>
> "hello" => 3
>
> What is happening? CountByKey are counting using the RocksDB data stored
> prior to updating the changelog again?
>
> Q3) Could someone explain me in depth the following process step by step?
> What happen if the changelog Kafka topic was deleted as I did in the
> question above?
>
> "If tasks run on a machine that fails and are restarted on another machine,
> Kafka Streams guarantees to restore their associated state stores to the
> content before the failure by replaying the corresponding changelog topics
> prior to resuming the processing on the newly started tasks."
>
> Q4) What are the differences with operations without changelog Kafka topic
> associated like joins between two KTables when machine fails occurs and we
> need fault tolerance policy?
>
> Thanks!
>

-- 
-- Guozhang

Re: Kafka Streams State Store and RocksDB

Reply via email to