David Yan created APEXMALHAR-2244: ------------------------------------- Summary: Optimize WindowedStorage and Spillable data structures for time series Key: APEXMALHAR-2244 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244 Project: Apache Apex Malhar Issue Type: Sub-task Reporter: David Yan Assignee: Siyuan Hua
The spillable data structures currently does not make any assumption about the key that is used in Managed State, and as a result, it uses ManagedStateImpl to interface with Managed State. But for WindowedStorage used by WindowedOperator, the key to the storage is a window, which is time based. Using the default ManagedStateImpl would be wrong for event time based keys, since ManagedStateImpl appears to purge data based on the apex window id (process time based). In a high level, the below summarizes roughly what needs to be done: 1. a way to tell the spillable data structures to use the ManagedTimeUnifiedStateImpl 2. a way to tell the spillable data structures how to extract the timestamp from the key. Note that in the case of WindowedOperator, the timestamp should be the end timestamp of the window (beginTimeMillis + durationMillis), not the begin timestamp. 3. a way to tell the spillable data structures how to assign the time bucket given that timestamp 4. only purge a time bucket when all keys that belong to that time bucket are removed -- This message was sent by Atlassian JIRA (v6.3.4#6332)