Re: State TTL in Flink 1.6.0

Chesnay Schepler Wed, 22 Aug 2018 02:01:49 -0700

Just a quick note for the docs:https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#state-time-to-live-ttl


On 22.08.2018 10:53, Juho Autio wrote:

First, I couldn't find anything about State TTL in Flink docs, isthere anything like that? I can manage based on Javadocs & sourcecode, but just wondering.
Then to main main question, why doesn't the TTL support event time,and is there any sensible use case for the TTL if the streamingcharateristic of my job is event time?
I have a job that is cleaning up old entries from a keyed MapState bycalling registerEventTimeTimer & implementing the onTimer method. Thisway I can keep the state for a certain time in _event time_.
That's more complicated code than it would have to be, so I wanted toconvert by function to use Flink's own state TTL. I started writing this:
MapStateDescriptor<String, String> stateDesc = newMapStateDescriptor<>(
                "deviceState", String.class, String.class);
        StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.milliseconds(stateRetentionMillis))
                // TODO EventTime is not supported?
.setTimeCharacteristic(StateTtlConfig.TimeCharacteristic.ProcessingTime)
                .build();
        stateDesc.enableTimeToLive(ttlConfig);
So, I realized that ProcessingTime is the only existingTimeCharacteristic in StateTtlConfig.
Based on some comments in Flink tickets it seems that it was aconscious choice, because supporting EventTime TTL would be much heavier:
https://issues.apache.org/jira/browse/FLINK-3089?focusedCommentId=16318013&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16318013
So I can't exactly match the current behaviour that guarantees to keepthe state available for 24 hours (or whatever is passed as--stateRetentionMillis).
However, if we accept the restriction and switch to processing time instate cleanup, what does it mean?
- As long as stream keeps up with the input rate (from kafka), there'sno big difference, because 24 hours in processing time ~= 24 hours ineven time.- If the stream is lagging behind a lot, then it would be possiblethat the state is cleaned "too early". However we aim at not having alot of lag, so this is not a real issue – job would be scaled up tocatch up before it starts lagging too much to get misses because ofcleared state. Still, if we fail to scale up quickly enough, the statemight be cleared too early and cause real trouble.- One problem is that if the stream is quickly processing a longbacklog (say, start streaming 7 days back in event time), then thestate size can temporarily grow bigger than usual – maybe thiswouldn't be a big problem, but it could at least require extraneousupscaling of resources.- After restoring from a savepoint, the processing time on the stateis as much older than what was the time of downtime due to jobrestart. Even this is not a huge issue as long as the deploymentdowntime is short compared to the 24 hour TTL.
Any way, all these issues combined, I'm a bit confused on the wholeTTL feature. Can it be used in event time based streaming in anysensible way? It seems like it would be more like a cache then, andcan't be relied on well enough.
Thanks.

Juho

Re: State TTL in Flink 1.6.0

Reply via email to