Hey all, I'm trying to do a state migration from rocksdb --> filesystem backend. The approach I'm taking here is: 1) Cancel job with savepoint while its running on rocksdb 2) Update the job/cluster with filesystem as the state backend 3) Submit a job with the previous rocksdb savepoint
>From what I understand about savepoints, this should work out of the box? However, it works in some cases but fails in others. Specifically, whenever there's a job with user managed state, for e.g., a Process Function with a ValueState, it throws the following error: Caused by: java.lang.IllegalStateException: Unexpected key-group in restore. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:418) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:315) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:95) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123) The error specifically comes from a precondition check in HeapKeyedStateBackend <https://github.com/apache/flink/blob/release-1.5/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L418> . On doing some debugging, I find that the value of writtenKeyGroupIndex <https://github.com/apache/flink/blob/release-1.5/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L410> always evaluates to 0, thus failing the check. Has anyone run into this issue before? Thanks Lakshmi