Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-19 Thread Arvid Heise
Hi Mathieu,

The easiest way is to already emit several inputs on the source level. If
you use DeserializationSchema, try to use the method with the collector.
The watermarks should then be generated as if you would only receive one
element at a time.

On Sun, Apr 18, 2021 at 11:08 AM Mathieu D  wrote:

> Hi,
>
> I can't change the way devices send their data. We are constrained in the
> messages sent per day per device.
>
> To illustrate my question:
> - at 9:08 a message is emitted. It packs together several measures:
> - measure m1 taken at 8:52
> - measure m2 taken at 9:07
>
> m1 must go in the 8:00-9:00 aggregation
> m2 in the 9:00-10:00 aggregation
>
> What's the proper way to set the watermarks in such a case ?
>
> Thanks for your insights !
>
> Mathieu
>
> Le sam. 17 avr. 2021 à 07:05, Lasse Nedergaard <
> lassenedergaardfl...@gmail.com> a écrit :
>
>> Hi
>>
>> One thing to remember is that Flinks watermark is global this mean it’s
>> shared between all keys (in your case ioT Devices) so the first requirement
>> your have is to ensure the timestamp is aligned or almost aligned between
>> yours IoT devices if not Flink’s watermark is hard to use.
>>
>> Med venlig hilsen / Best regards
>> Lasse Nedergaard
>>
>>
>> > Den 16. apr. 2021 kl. 18.29 skrev Mathieu D :
>> >
>> > 
>> > Hello,
>> >
>> > I'm totally new to Flink, and I'd like to make sure I understand things
>> properly around watermarks.
>> >
>> > We're processing messages from iot devices.
>> > Those messages have a timestamp, and we have a first phase of
>> processing based on this timestamp. So far so good.
>> >
>> > These messages actually "pack" together several measures taken at
>> different times, typically going from ~15mn back in the past from the
>> message timestamp, to a few seconds back.
>> >
>> > So at a point in the processing, I'll flatMap the message stream into a
>> stream of measures, and I'll first need to reaffect the event time. I guess
>> I can do it using a TimestampAssigner, correct ?
>> >
>> > The flatmapped stream will now mix together a large range of
>> event-times (so, a span of 15mn). What should I do regarding the watermark
>> ? Should I regenerate one ? and how ?
>> >
>> > My measures will go through windowed aggregations. Should I use the
>> allowedLateness param to manage that properly ?
>> > (Note: I'm ok with windows firing several times with updated content,
>> if that matters. Our downstream usage is made for that.)
>> >
>> > Thanks a lot for your insights and pointers :-)
>> >
>> > Mathieu
>> >
>> >
>> >
>>
>


Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-18 Thread Mathieu D
Hi,

I can't change the way devices send their data. We are constrained in the
messages sent per day per device.

To illustrate my question:
- at 9:08 a message is emitted. It packs together several measures:
- measure m1 taken at 8:52
- measure m2 taken at 9:07

m1 must go in the 8:00-9:00 aggregation
m2 in the 9:00-10:00 aggregation

What's the proper way to set the watermarks in such a case ?

Thanks for your insights !

Mathieu

Le sam. 17 avr. 2021 à 07:05, Lasse Nedergaard <
lassenedergaardfl...@gmail.com> a écrit :

> Hi
>
> One thing to remember is that Flinks watermark is global this mean it’s
> shared between all keys (in your case ioT Devices) so the first requirement
> your have is to ensure the timestamp is aligned or almost aligned between
> yours IoT devices if not Flink’s watermark is hard to use.
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>
> > Den 16. apr. 2021 kl. 18.29 skrev Mathieu D :
> >
> > 
> > Hello,
> >
> > I'm totally new to Flink, and I'd like to make sure I understand things
> properly around watermarks.
> >
> > We're processing messages from iot devices.
> > Those messages have a timestamp, and we have a first phase of processing
> based on this timestamp. So far so good.
> >
> > These messages actually "pack" together several measures taken at
> different times, typically going from ~15mn back in the past from the
> message timestamp, to a few seconds back.
> >
> > So at a point in the processing, I'll flatMap the message stream into a
> stream of measures, and I'll first need to reaffect the event time. I guess
> I can do it using a TimestampAssigner, correct ?
> >
> > The flatmapped stream will now mix together a large range of event-times
> (so, a span of 15mn). What should I do regarding the watermark ? Should I
> regenerate one ? and how ?
> >
> > My measures will go through windowed aggregations. Should I use the
> allowedLateness param to manage that properly ?
> > (Note: I'm ok with windows firing several times with updated content, if
> that matters. Our downstream usage is made for that.)
> >
> > Thanks a lot for your insights and pointers :-)
> >
> > Mathieu
> >
> >
> >
>


proper way to manage watermarks with messages combining multiple timestamps

2021-04-16 Thread Mathieu D
Hello,

I'm totally new to Flink, and I'd like to make sure I understand things
properly around watermarks.

We're processing messages from iot devices.
Those messages have a timestamp, and we have a first phase of processing
based on this timestamp. So far so good.

These messages actually "pack" together several measures taken at different
times, typically going from ~15mn back in the past from the message
timestamp, to a few seconds back.

So at a point in the processing, I'll flatMap the message stream into a
stream of measures, and I'll first need to reaffect the event time. I guess
I can do it using a TimestampAssigner, correct ?

The flatmapped stream will now mix together a large range of event-times
(so, a span of 15mn). What should I do regarding the watermark ? Should I
regenerate one ? and how ?

My measures will go through windowed aggregations. Should I use the
allowedLateness param to manage that properly ?
(Note: I'm ok with windows firing several times with updated content, if
that matters. Our downstream usage is made for that.)

Thanks a lot for your insights and pointers :-)

Mathieu