Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23607 )

Change subject: KUDU-3662: Fix race condition in Kudu replication
......................................................................


Patch Set 5: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/23607/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23607/3//COMMIT_MSG@21
PS3, Line 21: to defer split removal until checkpoint
            : completes
> Yes this can happen.
Thanks a lot for the detailed response.  Yes, that sounds like a reasonable 
approach to me.


http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java
File 
java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java:

http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java@85
PS3, Line 85:     // Use AT_LEAST_ONCE mode since the replication job uses 
UPSERT semantics which are
            :     // idempotent. This avoids the overhead of checkpoint barrier 
alignment while still
            :     // providing fault tolerance.
            :     
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
> Actually we can only use AT_LEAST_ONCE semantics because the Flink Kudu sin
Ah, indeed -- I guess tracking the state of what exact records have been read 
and replicated is cumbersome, and definitely the stateless design that relies 
on UPSERT is much simpler.  Thanks.


http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/ReplicationEnvProvider.java@101
PS3, Line 101:     env.configure(config);
> By default 10 minutes [1].
OK, this makes sense to me.  Thanks!


http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java
File 
java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java:

http://gerrit.cloudera.org:8080/#/c/23607/3/java/kudu-replication/src/main/java/org/apache/kudu/replication/wrappedsource/MetricWrappedKuduEnumerator.java@250
PS3, Line 250: after checkpoint completes do we remove fin
> Ah yes this is a leftover from the development iterations. Removed the unas
I see; thank you for the explanation.

Yep, documenting the reliance on particular setting of the 
execution.checkpointing.max-concurrent-checkpoints is a good idea.  BTW, maybe 
it's a good idea to enforce the setting to be always 1 then?  If it makes 
sense, feel free just to file a JIRA about that and adding a TODO note, if 
needed.  PS5 looks good to me in this regard with the explanation you provided.



--
To view, visit http://gerrit.cloudera.org:8080/23607
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I605a914aaa86b1bdf47537a5b21789de27972add
Gerrit-Change-Number: 23607
Gerrit-PatchSet: 5
Gerrit-Owner: Marton Greber <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Comment-Date: Thu, 20 Nov 2025 05:25:11 +0000
Gerrit-HasComments: Yes

Reply via email to