David Yan created APEXMALHAR-2244:
-------------------------------------

             Summary: Optimize WindowedStorage and Spillable data structures 
for time series
                 Key: APEXMALHAR-2244
                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2244
             Project: Apache Apex Malhar
          Issue Type: Sub-task
            Reporter: David Yan
            Assignee: Siyuan Hua


The spillable data structures currently does not make any assumption about the 
key that is used in Managed State, and as a result, it uses ManagedStateImpl to 
interface with Managed State. But for WindowedStorage used by WindowedOperator, 
the key to the storage is a window, which is time based. Using the default 
ManagedStateImpl would be wrong for event time based keys, since 
ManagedStateImpl appears to purge data based on the apex window id (process 
time based).

In a high level, the below summarizes roughly what needs to be done:

1. a way to tell the spillable data structures to use the 
ManagedTimeUnifiedStateImpl
2. a way to tell the spillable data structures how to extract the timestamp 
from the key. Note that in the case of WindowedOperator, the timestamp should 
be the end timestamp of the window (beginTimeMillis + durationMillis), not the 
begin timestamp.
3. a way to tell the spillable data structures how to assign the time bucket 
given that timestamp
4. only purge a time bucket when all keys that belong to that time bucket are 
removed





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to