Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/22867 )
Change subject: KUDU-3665 Send no-op heartbeat operations batched PART1 ...................................................................... Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG@7 PS7, Line 7: Send no-op > one send is enough :) Done http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG@15 PS7, Line 15: fewer RPC calls > Do we want to capture the number of rpc calls between the servers as a part If there is no client activity, then the number of RPC calls sent to other tablet servers is exactly the same as no_op_heartbeat_count when batching is turned off, or heartbeat_batch_count when batching is turned on, since this is the only activity occurring. I experimented with multi_raft_batch_size and took measurements with a value of 30. Then I pushed the default value of 10 (there was no significant difference). I will update it to 30 so that the measurements in the commit message reflect the default value. This also shows that even batching such a very small and easy-to-process message only resulted in a ~12% improvement. So, I think instead of generic RPC batching, the next step should be to optimize the response time for these no-op messages. In case there is write/read activity, we should decide on the size and number of parallel operations. The measurements below were taken with 2000 tablets and 5 parallel writes (10 hash splits). So, 1950 tablets still send normal no-ops, while 50 tablets send heartbeats with actual operations. Is this the setup we want to measure, or should we use a different one? http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG@29 PS7, Line 29: if > nit: if Done http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/consensus/multi_raft_batcher.h File src/kudu/consensus/multi_raft_batcher.h: http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/consensus/multi_raft_batcher.h@18 PS5, Line 18: #pragma once : > style nit: in new code, please use '#pragma once' instead of include guard Done http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/consensus/multi_raft_batcher.h@104 PS5, Line 104: : // Timer to flush the buffer if it was not flushed by the batch size for : // FLAGS_multi_raft_heartbeat_interval_ms ms. : std::shared_ptr<rpc::PeriodicTimer> batch_sender_; : : // Mutex for data in current_batch_ : std::mutex current_batch_mutex_; > Please document all these non-trivial fields. Done http://gerrit.cloudera.org:8080/#/c/22867/7/src/kudu/consensus/multi_raft_batcher.cc File src/kudu/consensus/multi_raft_batcher.cc: http://gerrit.cloudera.org:8080/#/c/22867/7/src/kudu/consensus/multi_raft_batcher.cc@154 PS7, Line 154: to lock > nit: to lock Done http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/tserver/ts_tablet_manager.cc@520 PS5, Line 520: : InitLocalRaftPeerPB(); > What makes it impossible to create this instance in the constructor? Pleas Nothing -- To view, visit http://gerrit.cloudera.org:8080/22867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538 Gerrit-Change-Number: 22867 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Martonka <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Zoltan Chovan <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Fri, 20 Jun 2025 23:01:44 +0000 Gerrit-HasComments: Yes
