On a clean restart on the same machine, the local RocksDB will just be
reused as it contains the complete state. Thus there is no need to read
the changelog topic at all.
The changelog topic is only read when a state is moved from one node to
another, or the state got corrupted due to an failure
I understand now. The commit triggers the output of the window data,
whether or not the window is complete. For example, if I use .print() as
you suggest:
[KSTREAM-AGGREGATE-03]: [kafka@148363192] , (9<-null)
[KSTREAM-AGGREGATE-03]: [kafka@1483631925000] , (5<-null)
There is no such thing as a final window aggregate and you might see
intermediate results -- thus the count do not add up.
Please have a look here:
http://stackoverflow.com/questions/38935904/how-to-send-final-kafka-streams-aggregation-result-of-a-time-windowed-ktable/38945277#38945277
and
I'm hoping the DSL will do what I want :) Currently the example is
continuously adding instead of bucketing, so if I modify it by adding a
window to the count function:
.groupBy((key, word) -> word)
.count(TimeWindows.of(5000L), "Counts")
.toStream((k, v) -> k.key());
Then I do see bucketing
Do you know about Kafka Streams? It's DSL gives you exactly what you
want to do.
Check out the documentation and WordCount example:
http://docs.confluent.io/current/streams/index.html
Hello,
I'm looking for guidance on how to approach a counting problem. We want to
consume a stream of data that consists of IDs and generate an output of the
aggregated count with a window size of X seconds using processing time and
a hopping time window. For example, using a window size of 1