Hey all,

I'm trying to do a state migration from rocksdb --> filesystem backend. The
approach I'm taking here is:
1) Cancel job with savepoint while its running on rocksdb
2) Update the job/cluster with filesystem as the state backend
3) Submit a job with the previous rocksdb savepoint

>From what I understand about savepoints, this should work out of the box?
However, it works in some cases but fails in others. Specifically, whenever
there's a job with user managed state, for e.g., a Process Function with a
ValueState, it throws the following error:

Caused by: java.lang.IllegalStateException: Unexpected key-group in restore.
        at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
        at 
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:418)
        at 
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:315)
        at 
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:95)
        at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
        at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)



The error specifically comes from a precondition check in
HeapKeyedStateBackend
<https://github.com/apache/flink/blob/release-1.5/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L418>
.
On doing some debugging, I find that the value of writtenKeyGroupIndex
<https://github.com/apache/flink/blob/release-1.5/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L410>
always
evaluates to 0, thus failing the check.

Has anyone run into this issue before?

Thanks
Lakshmi

Reply via email to