Yun Tang created FLINK-25478:
--------------------------------

             Summary: Changelog materialization with incremental checkpoint 
could cause checkpointed data lost
                 Key: FLINK-25478
                 URL: https://issues.apache.org/jira/browse/FLINK-25478
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing, Runtime / State Backends
            Reporter: Yun Tang
             Fix For: 1.15.0


Currently, changelog materialization would call RocksDB state backend's 
snapshot method to generate {{IncrementalRemoteKeyedStateHandle}} as 
ChangelogStateBackendHandleImpl's materialized artifacts. And before next 
materialization, it will always report the same 
{{IncrementalRemoteKeyedStateHandle}} as before.

It's fine to register this for the 1st time. However, for the 2nd time to 
register {{IncrementalRemoteKeyedStateHandle}} (via 
{{ChangelogStateBackendHandleImpl#registerSharedStates}}), it will discard the 
private state artifacts without check the register reference:

IncrementalRemoteKeyedStateHandle:
{code:java}
public void discardState() throws Exception {

        try {
            StateUtil.bestEffortDiscardAllStateObjects(privateState.values());
        } catch (Exception e) {
            LOG.warn("Could not properly discard misc file states.", e);
        }
}
{code}

Thus, this would delete the private state (such as RocksDB's MAINFEST), and 
once restore, job would not report FileNotFoundException.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to