Re: Order of punctuate() and process() in a stream processor

2017-05-18 Thread Matthias J. Sax
Thanks Sini! I intended to create a new JIRA, but than changed my mind and just picky-backed it to the existing one, as it's highly related and we might be able to tackle it in one effort. -Matthias On 5/18/17 12:12 AM, Peter Sinoros Szabo wrote: > Hi Michal, > > yes, I know its beyond the

Re: Order of punctuate() and process() in a stream processor

2017-05-18 Thread Peter Sinoros Szabo
Hi Michal, yes, I know its beyond the scope of KIP-138, but from previous messages from Matthias I thought that he will create a new ticket, but it seems that instead he added it to KAFKA-3514. I will update that ticket with my thoughts. Thanks, Sini From: Michal Borowiecki

Re: Order of punctuate() and process() in a stream processor

2017-05-17 Thread Michal Borowiecki
Hi Sini, This is beyond the score of KIP-138 but https://issues.apache.org/jira/browse/KAFKA-3514 exists to track such improvements Thanks, Michal On 17 May 2017 5:10 p.m., Peter Sinoros Szabo wrote: Hi, I see, now its clear why the repeated punctuations

Re: Order of punctuate() and process() in a stream processor

2017-05-17 Thread Peter Sinoros Szabo
Hi, I see, now its clear why the repeated punctuations use the same time value in that case. Do you have a JIRA ticket to track improvement ideas for that? It would be great to have an option to: - advance the stream time before calling the process() on a new record - this would prevent to

Re: Order of punctuate() and process() in a stream processor

2017-05-14 Thread Mahendra Kariya
We use Kafka Streams for quite a few aggregation tasks. For instance, counting the number of messages with a particular status in a 1-minute time window. We have noticed that whenever we restart a stream, we see a sudden spike in the aggregated numbers. After a few minutes, things are back to

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Matthias J. Sax
I added the feedback to https://issues.apache.org/jira/browse/KAFKA-3514 -Matthias On 5/12/17 10:38 AM, Thomas Becker wrote: > Thanks. I think the system time based punctuation scheme we were discussing > would not result in repeated punctuations like this, but even using stream > time it

RE: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Thomas Becker
Thanks. I think the system time based punctuation scheme we were discussing would not result in repeated punctuations like this, but even using stream time it seems a bit odd. If you do anything in a punctuate call that is relatively expensive it's especially bad.

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Matthias J. Sax
Thanks for sharing. As punctuate is called with "streams time" you see the same time value multiple times. It's again due to the coarse grained advance of "stream time". @Thomas: I think, the way we handle it just simplifies the implementation of punctuations. I don't see any other "advantage".

RE: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Peter Sinoros Szabo
Well, this is also a good question, because it is triggered with the same timestamp 3 times, so in order to create my update for both three seconds, I will have to count the number of punctuations and calculate the missed stream times for myself. It's ok for me to trigger it 3 times, but the

RE: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Thomas Becker
I'm a bit troubled by the fact that it fires 3 times despite the stream time being advanced all at once; is there a scenario when this is beneficial? From: Matthias J. Sax [matth...@confluent.io] Sent: Friday, May 12, 2017 12:38 PM To:

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Peter Sinoros Szabo
Hi, It is actually critical for me (and of course I would like to understand it too), because using this design the "first message" will be processed in a bad window/segment (I mean the timeframe between the punctuate() calls): it will be processed in the "3 seconds before segment" instead of

Re: Order of punctuate() and process() in a stream processor

2017-05-12 Thread Matthias J. Sax
Hi Peter, It's by design. Streams internally tracks time progress (so-called "streams time"). "streams time" get advanced *after* processing a record. Thus, in your case, "stream time" is still at its old value before it processed the first message of you send "burst". After that, "streams time"

Order of punctuate() and process() in a stream processor

2017-05-12 Thread Peter Sinoros Szabo
Hi, Let's assume the following case. - a stream processor that uses the Processor API - context.schedule(1000) is called in the init() - the processor reads only one topic that has one partition - using custom timestamp extractor, but that timestamp is just a wall clock time Image the