Yes we are considering to differentiate "process" and "punctuate" function
to be "data-driven" and "time-driven" computations. That is, triggering of
punctuate should NOT be depending on the arrival of messages, or the
message's associated timestamps.
As for now I think periodically inserting the
I know that the Kafka team is working on a new way to reason about time. My
team's solution was to not use punctuate...but this only works if you have
guarantees that all of the tasks will receive messages..if not all the
partitions. Another solution is to periodically send canaries to all
pa
Thanks for the comments.
@David: yes, I have a source which is reading data from two topics and one
of them were empty while the second one was loaded with plenty of data.
So what do you suggest to solve this ?
Here is snippet of my code:
StreamsConfig config = new StreamsConfig(configProperties);
If you are consuming from more than one topic/partition, punctuate is triggered
when the “smallest” time-value changes. So, if there is a partition that
doesn’t have any more messages on it, it will always have the smallest
time-value and that time value won’t change…hence punctuate never gets
Your understanding is correct:
Punctuate is not triggered base on wall-clock time, but based in
internally tracked "stream time" that is derived from TimestampExtractor.
Even if you use WallclockTimestampExtractor, "stream time" is only
advance if there are input records.
Not sure why punctuate()
Hello,
I am using low level processor and I set the context.schedule(1),
assuming that punctuate() method is invoked every 10 sec .
I have set
configProperties.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG,
WallclockTimestampExtractor.class.getCanonicalName()) )
Although data is keep co
Thanks for the follow-up and the bug report, David.
We're taking a look at that.
On Mon, Oct 10, 2016 at 4:36 PM, David Garcia wrote:
> Thx for the responses. I was able to identify a bug in how the times are
> obtained (offsets resolved as unknown cause the issue):
>
> “Actually, I think th
Thx for the responses. I was able to identify a bug in how the times are
obtained (offsets resolved as unknown cause the issue):
“Actually, I think the bug is more subtle. What happens when a consumed topic
stops receiving messages? The smallest timestamp will always be the static
timestamp
> We have run the application (and have confirmed data is being received)
for over 30 mins…with a 60-second timer.
Ok, so your app does receive data but punctuate() still isn't being called.
:-(
> So, do we need to just rebuild our cluster with bigger machines?
That's worth trying out. See
htt
Yeah, this is possible. We have run the application (and have confirmed data
is being received) for over 30 mins…with a 60-second timer. So, do we need to
just rebuild our cluster with bigger machines?
-David
On 10/7/16, 11:18 AM, "Michael Noll" wrote:
David,
punctuate() is sti
David,
punctuate() is still data-driven at this point, even when you're using the
WallClock timestamp extractor.
To use an example: Imagine you have configured punctuate() to be run every
5 seconds. If there's no data being received for a minute, then punctuate
won't be called -- even though you
Hello, I’m sure this question has been asked many times.
We have a test-cluster (confluent 3.0.0 release) of 3 aws m4.xlarges. We have
an application that needs to use the punctuate() function to do some work on a
regular interval. We are using the WallClock extractor. Unfortunately, the
meth
12 matches
Mail list logo