Re: [Discuss] Semantics of event time for state TTL

2019-04-16 Thread Yu Li
Thanks for initiating the discussion and wrap-up the conclusion Andrey, and thanks all for participating. Just to confirm, that for the out-of-order case, the conclusion is to update the data and timestamp with the currently-being-processed record w/o checking whether it's an old data, right? In

Re: [Discuss] Semantics of event time for state TTL

2019-04-15 Thread Andrey Zagrebin
Hi everybody, Thanks a lot for your detailed feedback on this topic. It looks like we can already do some preliminary wrap-up for this discussion. As far as I see we have the following trends: *Last access timestamp: **Event timestamp of currently being processed record* *Current timestamp to

Re: [Discuss] Semantics of event time for state TTL

2019-04-09 Thread aitozi
Hi, Andrey I think ttl state has another scenario to simulate the slide window with the process function. User can define a state to store the data with the latest 1 day. And trigger calculate on the state every 5min. It is a operator similar to slidewindow. But i think it is more efficient than

Re: [Discuss] Semantics of event time for state TTL

2019-04-09 Thread Aljoscha Krettek
I think so, I just wanted to bring it up again because the question was raised. > On 8. Apr 2019, at 22:56, Elias Levy wrote: > > Hasn't this been always the end goal? It's certainly what we have been > waiting on for job with very large TTLed state. Beyond timer storage, > timer processing

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Elias Levy
Hasn't this been always the end goal? It's certainly what we have been waiting on for job with very large TTLed state. Beyond timer storage, timer processing to simply expire stale data that may not be accessed otherwise is expensive. On Mon, Apr 8, 2019 at 7:11 AM Aljoscha Krettek wrote: > I

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Aljoscha Krettek
I had a discussion with Andrey and now think that also the case event-time-timestamp/watermark-cleanup is a valid case. If you don’t need this for regulatory compliance but just for cleaning up old state, in case where you have re-processing of old data. I think the discussion about whether to

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Kostas Kloudas
Hi all, For GDPR: I am not sure about the regulatory requirements of GDPR but I would assume that the time for deletion starts counting from the time an organisation received the data (i.e. the wall-clock ingestion time of the data), and not the "event time" of the data. In other case, an

Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Aljoscha Krettek
Oh boy, this is an interesting pickle. For *last-access-timestamp*, I think only *event-time-of-current-record* makes sense. I’m looking at this from a GDPR/regulatory compliance perspective. If you update a state, by say storing the event you just received in state, you want to use the exact

Re: [Discuss] Semantics of event time for state TTL

2019-04-05 Thread Konstantin Knauf
Hi Andrey, I agree with Elias. This would be the most natural behavior. I wouldn't add additional slightly different notions of time to Flink. As I can also see a use case for the combination * Timestamp stored: Event timestamp * Timestamp to check expiration: Processing Time we could (maybe

Re: [Discuss] Semantics of event time for state TTL

2019-04-04 Thread Elias Levy
My 2c: Timestamp stored with the state value: Event timestamp Timestamp used to check expiration: Last emitted watermark That follows the event time processing model used elsewhere is Flink. E.g. events are segregated into windows based on their event time, but the windows do not fire until the

[Discuss] Semantics of event time for state TTL

2019-04-04 Thread Andrey Zagrebin
Hi All, As you might have already seen there is an effort tracked in FLINK-12005 [1] to support event time scale for state with time-to-live (TTL) [2]. While thinking about design, we realised that there can be multiple options for semantics of this feature, depending on use case. There is also