Elias Levy created FLINK-11517:
----------------------------------

             Summary: Inefficient window state access when using RocksDB state 
backend
                 Key: FLINK-11517
                 URL: https://issues.apache.org/jira/browse/FLINK-11517
             Project: Flink
          Issue Type: Bug
            Reporter: Elias Levy


When using an aggregate function on a window with a process function and the 
RocksDB state backend, state access is inefficient.

The WindowOperator calls windowState.add to merge the new element using the 
aggregate function.  The add method of RocksDBAggregatingState will read the 
state, deserialize the state, call the aggregate function, deserialize the 
state, and write it out.

If the trigger decides the window must be fired, as the the windowState.add 
does not return the state, the WindowOperator must call windowState.get to get 
it and pass it to the window process function, resulting in another read and 
deserialization.

Finally, while the state is not passed in to the trigger, in some cases the 
trigger may have a need to access the state.  That is our case.  As the state 
is not passed to the trigger, we must read and deserialize the state one more 
from within the trigger.

Thus, state must be read and deserialized three times to process a single 
element.  If the state is large, this can be quite costly.

 

Ideally  windowState.add would return the state, so that the WindowOperator can 
pass it to the process function without having to read it again.  Additionally, 
the state would be made available to the trigger to enable more use cases 
without having to go through the state descriptor again.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to