Re: 'Custom' mapping function on keyed WindowedStream

2018-02-26 Thread Seth Wiesman
I had to solve a similar problem, we use a process function with rocksdb and 
map state for the sub keys. So while we hit rocks on every element, only the 
specified sub keys are ever read from disk. 

Seth Wiesman| Software Engineer4 World Trade Center, 46th Floor, New York, NY 
10007swies...@mediamath.com  
 

On 2/26/18, 6:32 AM, "Marchant, Hayden "  wrote:

I would like to create a custom aggregator function for a windowed 
KeyedStream which I have complete control over - i.e. instead of implementing 
an AggregatorFunction, I would like to control the lifecycle of the flink state 
by implementing the CheckpointedFunction interface, though I still want this 
state to be per-key, per-window. 

I am not sure which function I should be calling on the WindowedStream in 
order to invoke this custom functionality. I see from the documentation that 
CheckpointedFunction is for non-keyed state - which I guess eliminates this 
option.

A little background - I have logic that needs to hold a very large state in 
the operator - lots of counts by sub-key. Since only a sub-set of these 
aggregations are updated, I was interesting in trying out incremental 
checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on 
every update of state since we need very low latency, and instead wanted to 
hold the state in Java Heap and then update the Flink state on checkpoint - i.e 
something like CheckpointedFunction.
My assumption is that any update I make to RocksDB backed state will hit 
the local disk - if this is wrong then I'll be happy

What other options do I have?

Thanks,
Hayden Marchant





'Custom' mapping function on keyed WindowedStream

2018-02-26 Thread Marchant, Hayden
I would like to create a custom aggregator function for a windowed KeyedStream 
which I have complete control over - i.e. instead of implementing an 
AggregatorFunction, I would like to control the lifecycle of the flink state by 
implementing the CheckpointedFunction interface, though I still want this state 
to be per-key, per-window. 

I am not sure which function I should be calling on the WindowedStream in order 
to invoke this custom functionality. I see from the documentation that 
CheckpointedFunction is for non-keyed state - which I guess eliminates this 
option.

A little background - I have logic that needs to hold a very large state in the 
operator - lots of counts by sub-key. Since only a sub-set of these 
aggregations are updated, I was interesting in trying out incremental 
checkpointing in RocksDB. However, I don't want to be hitting RocksDB I/O on 
every update of state since we need very low latency, and instead wanted to 
hold the state in Java Heap and then update the Flink state on checkpoint - i.e 
something like CheckpointedFunction.
My assumption is that any update I make to RocksDB backed state will hit the 
local disk - if this is wrong then I'll be happy

What other options do I have?

Thanks,
Hayden Marchant