Re: Deduplicating KStream-KStream join

2017-05-06 Thread Ofir Sharony
Max, Thanks for your detailed answer. Couple of comments/questions: 1. When performing caching of a KTable in order to reduce the amount of duplicates, as you mentioned, it doesn't provide 100% solution. From the docs: *"The semantics of caching is that data is flushed to the state sto

Re: Verify time semantics through topology

2017-05-06 Thread Matthias J. Sax
About the join: Joins work perfectly fine if you apply them to "plain records" you read from a topic. When joining records, the records timestamp is used to compute the join result. The "problem" in your case is that you apply the join to a windowed aggregation result. And thus, there is no "reco

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-06 Thread Eno Thereska
Hi there, I wanted to add something: how many CPU cores does each of your Kubernetes instance have? In 0.10.2.1 we noticed a regression in environments with 1 core as described in https://issues.apache.org/jira/browse/KAFKA-5174 . If you have

Re: Kafka-streams process stopped processing messages

2017-05-06 Thread Eno Thereska
Yeah we’ve seen cases when the session timeout might also need increasing. Could you try upping it to something like 6ms and let us know how it goes: >> streamsProps.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 6); Thanks Eno > On May 6, 2017, at 8:35 AM, Shimi Kiviti wrote: > > Tha

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-06 Thread Sachin Mittal
Note on few things. Set changelog topic delete retention time to as less as possible if the previous values for same key are not needed and can be safely cleaned up. Set segment size and segment retention time also low so older segments can be compacted and cleaned up. Set delete ratio to be aggres

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-06 Thread Shimi Kiviti
This is very similar to issues that we see. Did you check the status of the consumer group? In my case it will be in rebalancing most of the time. Once in a while it will show consumers and offsets but after a short time will go back to rebalancing. How much storage does your Kafka-streams use? A

Re: Kafka-streams process stopped processing messages

2017-05-06 Thread Shimi Kiviti
Thanks Eno, I already set the the recurve buffer size to 1MB I will also try producer What about session timeout and heart beat timeout? Do you think it should be increased? Thanks, Shimi On Sat, 6 May 2017 at 0:21 Eno Thereska wrote: > Hi Shimi, > > I’ve noticed with our benchmarks that on AW