sahnib commented on PR #45674: URL: https://github.com/apache/spark/pull/45674#issuecomment-2038922607
> Reviewed code changes. Reviewing tests. > > In overall, I'd like to understand the use case where we need to set different TTL per update. My gut feeling of main use case of state TTL was that they just consider the grouping key A to never appear again after TTL has expired (so they set an enough TTL value), which actually doesn't need such flexible TTL setup. Discussed offline, the reason we decided on two APIs because ttlDuration does not make sense for event time ttlMode. In eventTime, user might want to decide TTL based on event time column value (of row being processed), or add value to watermark (however watermark for first batch is always zero and then jumps significantly as we process first batch). Having such a interface however complicates the API. Its hard to decipher at this stage if Spark users would ever want to use eventTime ttl. If its needed, we should understand exact use-cases (how should ttl be calculated in event time) and then support this mode. Based on this discussion, we have decided to remove EventTimeTTL for now. Furthermore, to simplify the API - we accept a ttlConfig per state variable which sets ttlDuration at variable level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
