Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22867 )

Change subject: KUDU-3665 Send no-op heartbeat operations batched PART1
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG@7
PS7, Line 7: Send no-op
> one send is enough :)
Done


http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG@15
PS7, Line 15: fewer RPC calls
> Do we want to capture the number of rpc calls between the servers as a part
If there is no client activity, then the number of RPC calls sent to other 
tablet servers is exactly the same as no_op_heartbeat_count when batching is 
turned off, or heartbeat_batch_count when batching is turned on, since this is 
the only activity occurring.
I experimented with multi_raft_batch_size and took measurements with a value of 
30. Then I pushed the default value of 10 (there was no significant 
difference). I will update it to 30 so that the measurements in the commit 
message reflect the default value.

This also shows that even batching such a very small and easy-to-process 
message only resulted in a ~12% improvement.
So, I think instead of generic RPC batching, the next step should be to 
optimize the response time for these no-op messages.

In case there is write/read activity, we should decide on the size and number 
of parallel operations.
The measurements below were taken with 2000 tablets and 5 parallel writes (10 
hash splits). So, 1950 tablets still send normal no-ops, while 50 tablets send 
heartbeats with actual operations.
Is this the setup we want to measure, or should we use a different one?


http://gerrit.cloudera.org:8080/#/c/22867/7//COMMIT_MSG@29
PS7, Line 29: if
> nit: if
Done


http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/consensus/multi_raft_batcher.h
File src/kudu/consensus/multi_raft_batcher.h:

http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/consensus/multi_raft_batcher.h@18
PS5, Line 18: #pragma once
            :
> style nit: in new code, please use '#pragma once' instead of include guard
Done


http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/consensus/multi_raft_batcher.h@104
PS5, Line 104:
             :   // Timer to flush the buffer if it was not flushed by the 
batch size for
             :   // FLAGS_multi_raft_heartbeat_interval_ms ms.
             :   std::shared_ptr<rpc::PeriodicTimer> batch_sender_;
             :
             :   // Mutex for data in current_batch_
             :   std::mutex current_batch_mutex_;
> Please document all these non-trivial fields.
Done


http://gerrit.cloudera.org:8080/#/c/22867/7/src/kudu/consensus/multi_raft_batcher.cc
File src/kudu/consensus/multi_raft_batcher.cc:

http://gerrit.cloudera.org:8080/#/c/22867/7/src/kudu/consensus/multi_raft_batcher.cc@154
PS7, Line 154: to lock
> nit: to lock
Done


http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/22867/5/src/kudu/tserver/ts_tablet_manager.cc@520
PS5, Line 520:
             :   InitLocalRaftPeerPB();
> What makes it impossible to create this instance in the constructor?  Pleas
Nothing



--
To view, visit http://gerrit.cloudera.org:8080/22867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538
Gerrit-Change-Number: 22867
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Fri, 20 Jun 2025 23:01:44 +0000
Gerrit-HasComments: Yes

Reply via email to