Re: Single Key Aggregation

2017-06-23 Thread Matthias J. Sax
You can look into KStreamReduceProcessor#process() and KStreamAggregateProcessor#process() This should help to see the input data for each instance. count()/aggregate() will use AggregateProcessor while reduce() will use ReduceProcessor Hope this helps. -Matthias On 6/20/17 10:54 PM, Sameer K

Re: Single Key Aggregation

2017-06-20 Thread Sameer Kumar
Hi Matthias, I am working on 10.2.1. and I do see same outputs on same keys albeit not always and a bit random. I shall try to code a simpler example free from domain level code and share it with you in a while. I am willing to debug this further as well, if you could tell me the classes that I

Re: Single Key Aggregation

2017-06-19 Thread Matthias J. Sax
Hi Sameer, With regard to >>> What I saw was that while on Machine1, the counter was 100 , another >>> machine it was at 1. I saw it as inconsistent. If you really see the same key on different machines, that would be incorrect. All record with the same key, must be processed by the same machine

Re: Single Key Aggregation

2017-06-17 Thread Sameer Kumar
Continued from m last mail... The code snippet that I shared was after joining impression and notification logs. Here I am picking the line item and concatenating it with date. You can also see there is a check for a TARGETED_LINE_ITEM, I am not emitting the data otherwise. -Sameer. On Sat, Jun

Re: Single Key Aggregation

2017-06-17 Thread Sameer Kumar
The example I gave was just for illustration. I have impression logs and notification logs. Notification logs are essentially tied to impressions served. An impression would serve multiple items. I was just trying to aggregate across a single line item, this means I am always generating a single k

Re: Single Key Aggregation

2017-06-16 Thread Matthias J. Sax
I just double checked you example code from an email before. There you are using: stream.flatMap(...) .groupBy((k, v) -> k, Serdes.String(), Serdes.Integer()) .reduce((value1, value2) -> value1 + value2, In you last email, you say that you want to count on category that is contained i

Re: Single Key Aggregation

2017-06-15 Thread Sameer Kumar
Ok.. Let me try explain it again. So, Lets say my source processor has a different key, now the value that it contains lets say contains an identifier type: which basically denotes category and I am counting on different categories. A specific case would be I do a filter and outputs only a specifi

Re: Single Key Aggregation

2017-06-15 Thread Eno Thereska
I'm not sure if I fully understand this but let me check: - if you start 2 instances, one instance will process half of the partitions, the other instance will process the other half - for any given key, like key 100, it will only be processed on one of the instances, not both. Does this help?

Re: Single Key Aggregation

2017-06-14 Thread Sameer Kumar
Also, I am writing a single key in the output all the time. I believe machine2 will have to write a key and since a state store is local it wouldn't know about the counter state on another machine. So, I guess this will happen. -Sameer. On Thu, Jun 15, 2017 at 11:11 AM, Sameer Kumar wrote: > Th

Re: Single Key Aggregation

2017-06-14 Thread Sameer Kumar
The input topic contains 60 partitions and data is distributed well across different partitions on different keys. While consumption, I am doing some filtering and writing only single key data. Output would be something of the form:- Machine 1 2017-06-13 16:49:10 INFO LICountClickImprMR2:116 - l

Single Key Aggregation

2017-06-13 Thread Sameer Kumar
Hi, I witnessed a strange behaviour in KafkaStreams, need help in understanding the same. I created an application for aggregating clicks per user, I want to process it only for 1 user( i was writing only a single key). When I ran application on one machine, it was running fine.Now, to loadbalanc