Re: Aggregated windowed counts

2017-01-05 Thread Matthias J. Sax
On a clean restart on the same machine, the local RocksDB will just be reused as it contains the complete state. Thus there is no need to read the changelog topic at all. The changelog topic is only read when a state is moved from one node to another, or the state got corrupted due to an failure

Re: Aggregated windowed counts

2017-01-05 Thread Benjamin Black
I understand now. The commit triggers the output of the window data, whether or not the window is complete. For example, if I use .print() as you suggest: [KSTREAM-AGGREGATE-03]: [kafka@148363192] , (9<-null) [KSTREAM-AGGREGATE-03]: [kafka@1483631925000] , (5<-null)

Re: Aggregated windowed counts

2017-01-04 Thread Matthias J. Sax
There is no such thing as a final window aggregate and you might see intermediate results -- thus the count do not add up. Please have a look here: http://stackoverflow.com/questions/38935904/how-to-send-final-kafka-streams-aggregation-result-of-a-time-windowed-ktable/38945277#38945277 and

Re: Aggregated windowed counts

2017-01-04 Thread Benjamin Black
I'm hoping the DSL will do what I want :) Currently the example is continuously adding instead of bucketing, so if I modify it by adding a window to the count function: .groupBy((key, word) -> word) .count(TimeWindows.of(5000L), "Counts") .toStream((k, v) -> k.key()); Then I do see bucketing

Re: Aggregated windowed counts

2017-01-04 Thread Matthias J. Sax
Do you know about Kafka Streams? It's DSL gives you exactly what you want to do. Check out the documentation and WordCount example: http://docs.confluent.io/current/streams/index.html

Aggregated windowed counts

2017-01-04 Thread Benjamin Black
Hello, I'm looking for guidance on how to approach a counting problem. We want to consume a stream of data that consists of IDs and generate an output of the aggregated count with a window size of X seconds using processing time and a hopping time window. For example, using a window size of 1