Hongshun Wang created FLINK-34634: ------------------------------------- Summary: Restarting the job will not read the changelog anymore if it stops before the synchronization of meta information is complete and some table is removed Key: FLINK-34634 URL: https://issues.apache.org/jira/browse/FLINK-34634 Project: Flink Issue Type: Improvement Components: Flink CDC Reporter: Hongshun Wang Fix For: cdc-3.1.0 Attachments: image-2024-03-09-15-25-26-187.png, image-2024-03-09-15-27-46-073.png
Once, I removed a table from the option and then restarted the job from the savepoint, but the job couldn't read the binlog anymore. When I checked the logs, I found an Error level log stating: ' The enumerator received invalid request meta group id 6, the valid meta group id range is [0, 4].' It appears that the Reader is requesting more splits than the Enumerator is aware of. However, the code should indeed remove redundant split information from the Reader as seen in [https://github.com/ververica/flink-cdc-connectors/pull/2292]. So why does this issue occur? !image-2024-03-09-15-25-26-187.png! Upon examining the code, I discovered the cause. If the job stops before completing all the split meta information and then restarts, this issue occurs. Suppose that the totalFinishedSplitSize of binlogSplit in the Reader is 6, and no meta information has been synchronized, leaving the finishedSnapshotSplitInfos of binlogSplit in the Reader empty. After restarting, the totalFinishedSplitSize of binlogSplit in the Reader equals (6 - (0 - 0)) which is still 6, but in the Enumerator, it is only 4(the removed table have two split). This could lead to an out-of-range request. !image-2024-03-09-15-27-46-073.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)