Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23607 )
Change subject: KUDU-3662: Fix race condition in Kudu replication ...................................................................... Patch Set 5: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/23607/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23607/3//COMMIT_MSG@21 PS3, Line 21: to defer split removal until checkpoint : completes > Yes this can happen. Thanks a lot for the detailed response. Yes, that sounds like a reasonable approach to me. http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java File java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java: http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java@85 PS3, Line 85: // Use AT_LEAST_ONCE mode since the replication job uses UPSERT semantics which are : // idempotent. This avoids the overhead of checkpoint barrier alignment while still : // providing fault tolerance. : env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE); > Actually we can only use AT_LEAST_ONCE semantics because the Flink Kudu sin Ah, indeed -- I guess tracking the state of what exact records have been read and replicated is cumbersome, and definitely the stateless design that relies on UPSERT is much simpler. Thanks. http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java@101 PS3, Line 101: env.configure(config); > By default 10 minutes [1]. OK, this makes sense to me. Thanks! http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java File java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java: http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java@250 PS3, Line 250: after checkpoint completes do we remove fin > Ah yes this is a leftover from the development iterations. Removed the unas I see; thank you for the explanation. Yep, documenting the reliance on particular setting of the execution.checkpointing.max-concurrent-checkpoints is a good idea. BTW, maybe it's a good idea to enforce the setting to be always 1 then? If it makes sense, feel free just to file a JIRA about that and adding a TODO note, if needed. PS5 looks good to me in this regard with the explanation you provided. -- To view, visit http://gerrit.cloudera.org:8080/23607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I605a914aaa86b1bdf47537a5b21789de27972add Gerrit-Change-Number: 23607 Gerrit-PatchSet: 5 Gerrit-Owner: Marton Greber <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Comment-Date: Thu, 20 Nov 2025 05:25:11 +0000 Gerrit-HasComments: Yes
