Re: Large rocksdb state restore/checkpoint duration behavior

2018-10-23 Thread Aminouvic
Hello, Thank you for your answer and apologies for the late response. For timers we are using : state.backend.rocksdb.timer-service.factory: rocksdb Are we still affected by [1] ? For the interruptibility, we have coalesced our timers and the application became more responsive to stop

Re: Large rocksdb state restore/checkpoint duration behavior

2018-10-10 Thread Stefan Richter
Hi, I would assume that the problem about blocked processing during a checkpoint is caused by [1], because you mentioned the use of RocksDB incremental checkpoints and it could be that you use it in combination with heap-based timers. This is the one combination that currently still uses a

Large rocksdb state restore/checkpoint duration behavior

2018-10-10 Thread Aminouvic
Hi, We are using Flink 1.6.1 on yarn with rocksdb as backend incrementally checkpointed to hdfs (for data and timers). The job reads events from kafka (~1 billion event per day), constructs user sessions using an EventTimeSessionWindow coupled with a late firing trigger and WindowFunction with