Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-25 Thread via GitHub


PatrickRen merged PR #3230:
URL: https://github.com/apache/flink-cdc/pull/3230


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-17 Thread via GitHub


morazow commented on code in PR #3230:
URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1569863282


##
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java:
##
@@ -163,10 +163,18 @@ public String toString() {
 // ---
 public static StreamSplit appendFinishedSplitInfos(
 StreamSplit streamSplit, List 
splitInfos) {
+// re-calculate the starting changelog offset after the new table added
+Offset startingOffset = streamSplit.getStartingOffset();
+for (FinishedSnapshotSplitInfo splitInfo : splitInfos) {
+if (splitInfo.getHighWatermark().isBefore(startingOffset)) {
+startingOffset = splitInfo.getHighWatermark();
+}
+}

Review Comment:
   Got it, it will be always the min value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-17 Thread via GitHub


morazow commented on code in PR #3230:
URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1569850914


##
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java:
##
@@ -163,10 +163,18 @@ public String toString() {
 // ---
 public static StreamSplit appendFinishedSplitInfos(
 StreamSplit streamSplit, List 
splitInfos) {
+// re-calculate the starting changelog offset after the new table added
+Offset startingOffset = streamSplit.getStartingOffset();
+for (FinishedSnapshotSplitInfo splitInfo : splitInfos) {
+if (splitInfo.getHighWatermark().isBefore(startingOffset)) {
+startingOffset = splitInfo.getHighWatermark();
+}
+}

Review Comment:
   Do we have to distinguish the high watermarks before the startingOffset? For 
example, if there are multiple high watermarks before startingOffset, which one 
should we take? Should it be the latest of those?
   
   Or is taking any highWatermark if it is before the startingOffset is 
allright?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-17 Thread via GitHub


loserwang1024 commented on code in PR #3230:
URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1569794515


##
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java:
##
@@ -159,7 +159,15 @@ public String toString() {
 // ---
 public static StreamSplit appendFinishedSplitInfos(
 StreamSplit streamSplit, List 
splitInfos) {
+// re-calculate the starting changelog offset after the new table added
+Offset startingOffset = streamSplit.getStartingOffset();
+for (FinishedSnapshotSplitInfo splitInfo : splitInfos) {
+if (splitInfo.getHighWatermark().isBefore(startingOffset)) {
+startingOffset = splitInfo.getHighWatermark();
+}
+}
 splitInfos.addAll(streamSplit.getFinishedSnapshotSplitInfos());
+
 return new StreamSplit(
 streamSplit.splitId,
 streamSplit.getStartingOffset(),

Review Comment:
   done it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-17 Thread via GitHub


yuxiqian commented on code in PR #3230:
URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1568608219


##
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java:
##
@@ -159,7 +159,15 @@ public String toString() {
 // ---
 public static StreamSplit appendFinishedSplitInfos(
 StreamSplit streamSplit, List 
splitInfos) {
+// re-calculate the starting changelog offset after the new table added
+Offset startingOffset = streamSplit.getStartingOffset();
+for (FinishedSnapshotSplitInfo splitInfo : splitInfos) {
+if (splitInfo.getHighWatermark().isBefore(startingOffset)) {
+startingOffset = splitInfo.getHighWatermark();
+}
+}
 splitInfos.addAll(streamSplit.getFinishedSnapshotSplitInfos());
+
 return new StreamSplit(
 streamSplit.splitId,
 streamSplit.getStartingOffset(),

Review Comment:
   CMIIW, but seems newly added code just calculated the earliest starting 
offset into `startingOffset` but didn't really use it to generate new 
`StreamSplit`. Maybe missed a change here?
   
   ```suggestion
   return new StreamSplit(
   streamSplit.splitId,
   startingOffset,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-17 Thread via GitHub


loserwang1024 commented on code in PR #3230:
URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1568614060


##
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java:
##
@@ -159,7 +159,15 @@ public String toString() {
 // ---
 public static StreamSplit appendFinishedSplitInfos(
 StreamSplit streamSplit, List 
splitInfos) {
+// re-calculate the starting changelog offset after the new table added
+Offset startingOffset = streamSplit.getStartingOffset();
+for (FinishedSnapshotSplitInfo splitInfo : splitInfos) {
+if (splitInfo.getHighWatermark().isBefore(startingOffset)) {
+startingOffset = splitInfo.getHighWatermark();
+}
+}
 splitInfos.addAll(streamSplit.getFinishedSnapshotSplitInfos());
+
 return new StreamSplit(
 streamSplit.splitId,
 streamSplit.getStartingOffset(),

Review Comment:
   It seems true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-16 Thread via GitHub


loserwang1024 commented on PR #3230:
URL: https://github.com/apache/flink-cdc/pull/3230#issuecomment-2060235695

   @PatrickRen , @morazow , CC


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]

2024-04-16 Thread via GitHub


loserwang1024 opened a new pull request, #3230:
URL: https://github.com/apache/flink-cdc/pull/3230

   In mysql cdc, re-calculate the starting binlog offset after the new table 
added in MySqlBinlogSplit#appendFinishedSplitInfos, while there lack of same 
action in StreamSplit#appendFinishedSplitInfos. This will cause data loss if 
any newly added table snapshot split's highwatermark is smaller.
   
   Some unstable test problem occurs because of it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org