Re: [PR] [SPARK-47558][SS] State TTL support for ValueState [spark]

via GitHub Thu, 04 Apr 2024 21:49:47 -0700


sahnib commented on PR #45674:
URL: https://github.com/apache/spark/pull/45674#issuecomment-2038922607


   > Reviewed code changes. Reviewing tests.
   > 
   > In overall, I'd like to understand the use case where we need to set 
different TTL per update. My gut feeling of main use case of state TTL was that 
they just consider the grouping key A to never appear again after TTL has 
expired (so they set an enough TTL value), which actually doesn't need such 
flexible TTL setup.
   
   Discussed offline, the reason we decided on two APIs because ttlDuration 
does not make sense for event time ttlMode. In eventTime, user might want to 
decide TTL based on event time column value (of row being processed), or add 
value to watermark (however watermark for first batch is always zero and then 
jumps significantly as we process first batch). Having such a interface however 
complicates the API. 
   Its hard to decipher at this stage if Spark users would ever want to use 
eventTime ttl. If its needed, we should understand exact use-cases (how should 
ttl be calculated in event time) and then support this mode. 
   
   Based on this discussion, we have decided to remove EventTimeTTL for now. 
Furthermore, to simplify the API - we accept a ttlConfig per state variable 
which sets ttlDuration at variable level. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47558][SS] State TTL support for ValueState [spark]

Reply via email to