Todd Lipcon has posted comments on this change.
Change subject: consensus_peers: schedule raft heartbeats through messenger
......................................................................
Patch Set 2:
k, I ran the following experiment:
- built a release build of master
- started a three-node cluster as follows:
kudu-master -fs_wal_dir /data/1/todd/master -rpc_bind_addresses 0.0.0.0:6050
-webserver_port 6051 &
kudu-tserver -fs_wal_dir /data/1/todd/ts1 -fs_data_dirs $(echo
/data/{2..8}/todd/ts1 | xargs -n1 echo | paste -s -d, -) -rpc_bind_addresses
0.0.0.0:5050 -webserver_port 0 -logtostderr -tserver_master_addrs
localhost:6050 -superuser_acl '*' --noenable_leader_failure_detection
-unlock-unsafe-flags &
kudu-tserver -fs_wal_dir /data/1/todd/ts2 -fs_data_dirs $(echo
/data/{2..8}/todd/ts2 | xargs -n1 echo | paste -s -d, -) -rpc_bind_addresses
0.0.0.0:5052 -webserver_port 0 -logtostderr -tserver_master_addrs
localhost:6050 -superuser_acl '*' --noenable_leader_failure_detection
-unlock-unsafe-flags &
kudu-tserver -fs_wal_dir /data/1/todd/ts3 -fs_data_dirs $(echo
/data/{2..8}/todd/ts3 | xargs -n1 echo | paste -s -d, -) -rpc_bind_addresses
0.0.0.0:5053 -webserver_port 0 -logtostderr -tserver_master_addrs
localhost:6050 -superuser_acl '*' &
(the disabling leader election on the first two servers forces the third server
to become leader for all tablets)
I then used the following a bunch of times until I had 240 tablets per server:
kudu perf loadgen -table_num_replicas=3 -keep_auto_table -table_num_buckets 30
localhost:6050 -num_rows_per_thread 1
I then waited a bit for all of the compactions, flushes, etc, to finish, and
restarted to make sure everything was fresh.
I verified that 'ts3' above had 240 leaders and the others had 0.
Measurements:
without patch:
leader thread count: 840
480 heartbeats/sec received by each replica (based on UpdateConsensus RPC
metrics)
4611 voluntary ctx switch/sec on leader (based on metrics)
113.8 ms/sec of user CPU on leader (based on metrics)
117.3 ms/sec of system CPU on leader (based on metrics)
perf stat -I1000 on leader as a sanity check:
3.212441481 183.004618 task-clock (msec)
3.212441481 4,829 context-switches # 0.026
M/sec
3.212441481 181 cpu-migrations # 0.987
K/sec
3.212441481 0 page-faults # 0.000
K/sec
3.212441481 207,979,470 cycles # 1.134
GHz
3.212441481 72,630,019 instructions # 0.35
insn per cycle
3.212441481 13,339,338 branches # 72.749
M/sec
3.212441481 1,103,621 branch-misses # 8.16%
of all branches
with patch:
leader thread count: 386
638 heartbeats/sec received by each replica (based on replica RPC metrics)
7363 voluntary ctx switch/sec on leader (based on metrics)
174 ms/sec of user CPU on leader (based on metrics)
132 ms/sec of system CPU on leader (based on metrics)
perf stat -I1000 on leader as sanity check:
7.070447162 280.141355 task-clock (msec)
7.070447162 7,223 context-switches # 0.026
M/sec
7.070447162 110 cpu-migrations # 0.389
K/sec
7.070447162 2 page-faults # 0.007
K/sec
7.070447162 316,788,687 cycles # 1.119
GHz
7.070447162 106,365,459 instructions # 0.33
insn per cycle
7.070447162 19,182,491 branches # 67.764
M/sec
7.070447162 1,895,180 branch-misses # 9.76%
of all branches
so it seems the new code is having the intended effect on thread count (perhaps
obviously) but has an unintended effect on context switch rate (+50-60%), CPU
consumption (+33% by our metrics, +50% by perf stat), and heartbeat rate (+33%).
I also checked /tracing.html output on a replica and can see that the
heartbeats are still well spread out in time with the patch.
Any idea why the resulting heartbeat rate is higher than before with this patch?
--
To view, visit http://gerrit.cloudera.org:8080/7331
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac8e09fe02dd32885ef0cf644cb093b1c8e6afb8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: No