Hongshun Wang created FLINK-34634:
-------------------------------------

             Summary: Restarting the job will not read the changelog anymore if 
it stops before the synchronization of meta information is complete and some 
table is removed
                 Key: FLINK-34634
                 URL: https://issues.apache.org/jira/browse/FLINK-34634
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
            Reporter: Hongshun Wang
             Fix For: cdc-3.1.0
         Attachments: image-2024-03-09-15-25-26-187.png, 
image-2024-03-09-15-27-46-073.png

Once, I removed a table from the option and then restarted the job from the 
savepoint, but the job couldn't read the binlog anymore. When I checked the 
logs, I found an Error level log stating:

' The enumerator received invalid request meta group id 6, the valid meta group 
id range is [0, 4].'

It appears that the Reader is requesting more splits than the Enumerator is 
aware of.

However, the code should indeed remove redundant split information from the 
Reader as seen in 
[https://github.com/ververica/flink-cdc-connectors/pull/2292]. So why does this 
issue occur?

!image-2024-03-09-15-25-26-187.png!

Upon examining the code, I discovered the cause. If the job stops before 
completing all the split meta information and then restarts, this issue occurs. 
Suppose that the totalFinishedSplitSize of binlogSplit in the Reader is 6, and 
no meta information has been synchronized, leaving the 
finishedSnapshotSplitInfos of binlogSplit in the Reader empty. After 
restarting, the totalFinishedSplitSize of binlogSplit in the Reader equals (6 - 
(0 - 0)) which is still 6, but in the Enumerator, it is only 4(the removed 
table have two split). This could lead to an out-of-range request.

!image-2024-03-09-15-27-46-073.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to