Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22867 )

Change subject: KUDU-3665 Send no-op heartbeat operations batched PART1
......................................................................


Patch Set 9:

(1 comment)

A status update on measurements:
+ I used release binaries.
+ I sampled 1 tablet server.
+ I accidentaly set the enable_multi_raft_heartbeat_batcher for only the 
measured tablet server (at least I set it 4 times :) ).
I added a new metrics rpcs_call_count (Number of RPCs calls received.) I will 
post a separate Gerrit for it.
I redid the measurements using the same method, but properly turning on/off 
heartbeat batcher on all 4 tablet servers.
I also called  "sudo perf stat -e sched:sched_* -p <tserver-pid> sleep 40" on 
the tablet server before starting the workload.
I only add "stat_runtime" and "switch" here.
Results are below. They were made with multi_raft_batch_size == 10, but look 
similar with 30 too.
The change effects write times very negatively and I am unable to find why:
So there is still something wrong with the logic.

========== write count: 0:==========
Runs with batching off: 31, on: 21
    Off client task runtimes avg: n/a, min n/a, max n/a, med n/a
    On  client task runtimes avg: n/a, min n/a, max n/a, med n/a
    Change: n/a %
    Off cpu_stime avg: 15034.032258064517, min 9753, max 19892, med 15307
    On  cpu_stime avg: 9196.095238095239, min 5407, max 11005, med 9378
    Change: -38.83147860639787 %
    Off cpu_utime avg: 54317.83870967742, min 50460, max 90619, med 52890
    On  cpu_utime avg: 45386.142857142855, min 44258, max 46720, med 45326
    Change: -16.443393302655963 %
    Off no_op_heartbeat_count avg: 38645.6129032258, min 36218, max 41706, med 
38482
    On  no_op_heartbeat_count avg: 37650.619047619046, min 36229, max 39724, 
med 37521
    Change: -2.574661859027483 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3765.0, min 3623, max 3973, med 3752
    Change: n/a %
    Off rpcs_call_count avg: 89091.48387096774, min 87311, max 92536, med 88682
    On  rpcs_call_count avg: 9271.761904761905, min 9138, max 9382, med 9285
    Change: -89.59298745299797 %
    Off utime+stime avg: 69351.87096774194, min 61306, max 108896, med 68555
    On  utime+stime avg: 54582.23809523809, min 50182, max 57472, med 54525
    Change: -21.296661022128347 %
    Off stat_runtime avg: 70581492875.77777, min 65923340831, max 97824681025, 
med 69212805945
    On  stat_runtime avg: 55426293258.210526, min 53530420671, max 58058363713, 
med 54927652048
    Change: -21.47191707072581 %
    Off switch avg: 454359.0740740741, min 436333, max 573155, med 448428
    On  switch avg: 261909.26315789475, min 256881, max 270650, med 261295
    Change: -42.356326063998516 %
========== write count: 3:==========
Runs with batching off: 25, on: 22
    Off client task runtimes avg: 7.018060319953495, min 6.286839008331299, max 
8.06680965423584, med 6.969934344291687
    On  client task runtimes avg: 7.445825029592045, min 6.156267881393433, max 
18.063499212265015, med 6.6766228675842285
    Change: 6.095198532596613 %
    Off cpu_stime avg: 15270.48, min 10430, max 18048, med 15792
    On  cpu_stime avg: 9025.09090909091, min 6131, max 12372, med 9667.5
    Change: -40.89844648569717 %
    Off cpu_utime avg: 57305.76, min 54034, max 60906, med 57272
    On  cpu_utime avg: 51961.86363636364, min 48248, max 80256, med 49876.0
    Change: -9.325234258539394 %
    Off no_op_heartbeat_count avg: 38914.32, min 36395, max 41815, med 38645
    On  no_op_heartbeat_count avg: 38502.681818181816, min 36722, max 41588, 
med 38449.0
    Change: -1.0578064368545692 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3850.0454545454545, min 3671, max 4158, med 
3844.0
    Change: n/a %
    Off rpcs_call_count avg: 91895.6, min 90047, max 97232, med 91452
    On  rpcs_call_count avg: 11849.90909090909, min 10398, max 12972, med 
11773.0
    Change: -87.10503104511088 %
    Off utime+stime avg: 72576.24, min 64835, max 78624, med 73195
    On  utime+stime avg: 60986.954545454544, min 54796, max 92628, med 59509.0
    Change: -15.968429136788377 %
    Off stat_runtime avg: 73135547277.6, min 68310436116, max 77855412773, med 
73417558479.5
    On  stat_runtime avg: 62824669717.38461, min 57511759705, max 92188349332, 
med 60071950893
    Change: -14.098311893501636 %
    Off switch avg: 468075.9, min 451935, max 492367, med 468613.0
    On  switch avg: 284320.07692307694, min 264709, max 359979, med 278607
    Change: -39.25769796670221 %
========== write count: 6:==========
Runs with batching off: 23, on: 23
    Off client task runtimes avg: 7.766588854235272, min 6.901538848876953, max 
11.454949855804443, med 7.56096625328064
    On  client task runtimes avg: 8.483531257298988, min 6.517055034637451, max 
18.225762605667114, med 7.386918306350708
    Change: 9.231110549552945 %
    Off cpu_stime avg: 14428.260869565218, min 10254, max 18076, med 15655
    On  cpu_stime avg: 8722.478260869566, min 6196, max 12305, med 7337
    Change: -39.545879162272115 %
    Off cpu_utime avg: 60419.47826086957, min 58094, max 62810, med 60629
    On  cpu_utime avg: 55938.30434782609, min 51519, max 75305, med 54304
    Change: -7.416770290030284 %
    Off no_op_heartbeat_count avg: 38489.608695652176, min 36985, max 41631, 
med 38288
    On  no_op_heartbeat_count avg: 38860.086956521736, min 36551, max 41373, 
med 38599
    Change: 0.9625409907360494 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3885.913043478261, min 3655, max 4137, med 
3858
    Change: n/a %
    Off rpcs_call_count avg: 94014.60869565218, min 91749, max 99570, med 93664
    On  rpcs_call_count avg: 14600.91304347826, min 13601, max 16585, med 14574
    Change: -84.46952739999705 %
    Off utime+stime avg: 74847.73913043478, min 68451, max 80029, med 75465
    On  utime+stime avg: 64660.782608695656, min 58510, max 86783, med 63456
    Change: -13.610239454242755 %
    Off stat_runtime avg: 74222107113.33333, min 69232974879, max 77875913972, 
med 74407758825
    On  stat_runtime avg: 65171043120.8, min 60442913148, max 80076678727, med 
63239178100.5
    Change: -12.19456620749505 %
    Off switch avg: 477589.3333333333, min 455591, max 507781, med 478782
    On  switch avg: 301474.1, min 285197, max 325362, med 296702.5
    Change: -36.8758724371287 %
========== write count: 9:==========
Runs with batching off: 29, on: 29
    Off client task runtimes avg: 9.231343838257517, min 7.870201349258423, max 
10.95453691482544, med 9.118125796318054
    On  client task runtimes avg: 11.740418131656059, min 7.183986663818359, 
max 34.84843873977661, med 9.250254392623901
    Change: 27.17994625008082 %
    Off cpu_stime avg: 14144.034482758621, min 10589, max 18293, med 12199
    On  cpu_stime avg: 9234.620689655172, min 6645, max 12417, med 8341
    Change: -34.710137330957124 %
    Off cpu_utime avg: 64025.620689655174, min 61148, max 68610, med 63779
    On  cpu_utime avg: 59655.724137931036, min 54155, max 90393, med 57403
    Change: -6.825231063211223 %
    Off no_op_heartbeat_count avg: 39123.34482758621, min 36488, max 41343, med 
39478
    On  no_op_heartbeat_count avg: 38768.93103448276, min 36351, max 41414, med 
38818
    Change: -0.9058882737795648 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3876.793103448276, min 3635, max 4141, med 
3882
    Change: n/a %
    Off rpcs_call_count avg: 97298.62068965517, min 94500, max 103140, med 97170
    On  rpcs_call_count avg: 16838.51724137931, min 12361, max 19191, med 17025
    Change: -82.69398155695583 %
    Off utime+stime avg: 78169.6551724138, min 71877, max 86219, med 77362
    On  utime+stime avg: 68890.3448275862, min 61949, max 98556, med 66966
    Change: -11.87073209464825 %
    Off stat_runtime avg: 78446329426.23077, min 74690171750, max 81736141768, 
med 78625078998
    On  stat_runtime avg: 65925002460.53846, min 60491058980, max 74755623008, 
med 65483098107
    Change: -15.961647992041616 %
    Off switch avg: 509175.6923076923, min 485659, max 523400, med 513624
    On  switch avg: 308640.07692307694, min 273481, max 324254, med 312724
    Change: -39.384365438920575 %
========== write count: 12:==========
Runs with batching off: 21, on: 27
    Off client task runtimes avg: 11.195741676447684, min 9.398865222930908, 
max 13.112014532089233, med 11.085897207260132
    On  client task runtimes avg: 16.18674754301707, min 7.371404647827148, max 
50.03566360473633, med 12.2379709482193
    Change: 44.579501839247435 %
    Off cpu_stime avg: 12740.952380952382, min 10614, max 20413, med 11781
    On  cpu_stime avg: 9867.185185185184, min 6500, max 17699, med 8674
    Change: -22.555356223318558 %
    Off cpu_utime avg: 69119.28571428571, min 66646, max 72023, med 68595
    On  cpu_utime avg: 71110.92592592593, min 57992, max 127982, med 63442
    Change: 2.8814536942307845 %
    Off no_op_heartbeat_count avg: 38265.619047619046, min 35952, max 41726, 
med 38342
    On  no_op_heartbeat_count avg: 37984.62962962963, min 35968, max 40598, med 
37800
    Change: -0.7343130072970827 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3798.0, min 3595, max 4059, med 3780
    Change: n/a %
    Off rpcs_call_count avg: 99220.80952380953, min 95253, max 107668, med 98848
    On  rpcs_call_count avg: 19433.14814814815, min 14003, max 21986, med 19673
    Change: -80.41424148682754 %
    Off utime+stime avg: 81860.23809523809, min 78064, max 91841, med 81106
    On  utime+stime avg: 80978.11111111111, min 64492, max 138860, med 73753
    Change: -1.0776012929509138 %
    Off stat_runtime avg: 85271334809.33333, min 81844881169, max 90219901298, 
med 83749221961
    On  stat_runtime avg: 78795578908.75, min 69260478819, max 139333670620, 
med 72663701198.5
    Change: -7.594294043904803 %
    Off switch avg: 536528.0, min 527925, max 552615, med 529044
    On  switch avg: 351314.0, min 305501, max 466844, med 342029.0
    Change: -34.52084513762562 %
========== write count: 15:==========
Runs with batching off: 25, on: 20
    Off client task runtimes avg: 13.315507302298391, min 7.887014150619507, 
max 17.39699411392212, med 13.22769570350647
    On  client task runtimes avg: 17.306377828968646, min 7.778277158737183, 
max 38.203365325927734, med 13.348968982696533
    Change: 29.97159954980755 %
    Off cpu_stime avg: 16944.16, min 10779, max 20886, med 18255
    On  cpu_stime avg: 9530.3, min 7169, max 13511, med 8915.5
    Change: -43.754662373348694 %
    Off cpu_utime avg: 72017.32, min 67842, max 80895, med 71362
    On  cpu_utime avg: 81336.05, min 59646, max 127720, med 70689.0
    Change: 12.939567870617786 %
    Off no_op_heartbeat_count avg: 38303.08, min 36163, max 41426, med 38076
    On  no_op_heartbeat_count avg: 38184.35, min 36008, max 40705, med 38341.5
    Change: -0.3099750725007011 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3817.9, min 3602, max 4070, med 3833.0
    Change: n/a %
    Off rpcs_call_count avg: 101364.36, min 96882, max 111742, med 100902
    On  rpcs_call_count avg: 20991.65, min 13309, max 25414, med 22285.0
    Change: -79.29089672149067 %
    Off utime+stime avg: 88961.48, min 81042, max 101345, med 88830
    On  utime+stime avg: 90866.35, min 66815, max 138292, med 79662.0
    Change: 2.141230114427062 %
    Off stat_runtime avg: 86484412773.05556, min 80817575237, max 92561047931, 
med 85952700227.5
    On  stat_runtime avg: 74323239696.75, min 72183603156, max 77104294145, med 
74002530743.0
    Change: -14.061693531084952 %
    Off switch avg: 550984.2777777778, min 510471, max 578879, med 547104.0
    On  switch avg: 351021.5, min 337027, max 359830, med 353614.5
    Change: -36.29192081201752 %
========== write count: 18:==========
Runs with batching off: 33, on: 28
    Off client task runtimes avg: 15.55151865447777, min 10.442912101745605, 
max 17.809950351715088, med 15.4842209815979
    On  client task runtimes avg: 17.810001448615566, min 7.6174256801605225, 
max 41.84770750999451, med 15.779654741287231
    Change: 14.522586792432058 %
    Off cpu_stime avg: 15028.515151515152, min 11182, max 21980, med 12643
    On  cpu_stime avg: 11925.535714285714, min 7181, max 16504, med 12684.0
    Change: -20.647278895790322 %
    Off cpu_utime avg: 75328.78787878787, min 70750, max 79946, med 75837
    On  cpu_utime avg: 84688.46428571429, min 67951, max 130711, med 73700.5
    Change: 12.425098917013155 %
    Off no_op_heartbeat_count avg: 38082.63636363636, min 35765, max 40906, med 
37991
    On  no_op_heartbeat_count avg: 39006.32142857143, min 37042, max 41557, med 
38799.5
    Change: 2.4254756317686477 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3900.0714285714284, min 3704, max 4155, med 
3879.5
    Change: n/a %
    Off rpcs_call_count avg: 104423.66666666667, min 98909, max 114806, med 
103892
    On  rpcs_call_count avg: 23487.85714285714, min 11475, max 33926, med 
24702.5
    Change: -77.5071514986796 %
    Off utime+stime avg: 90357.30303030302, min 83827, max 101653, med 89136
    On  utime+stime avg: 96614.0, min 76280, max 141804, med 85011.5
    Change: 6.924395438848663 %
    Off stat_runtime avg: 89981924996.46153, min 86780506135, max 97900308762, 
med 88902721693
    On  stat_runtime avg: 86791590625.14285, min 77391916725, max 109831401699, 
med 80007359494.0
    Change: -3.5455280284836466 %
    Off switch avg: 575425.9230769231, min 538726, max 632636, med 576262
    On  switch avg: 392245.21428571426, min 348678, max 448921, med 388296.5
    Change: -31.833934038234357 %
========== write count: 21:==========
Runs with batching off: 25, on: 20
    Off client task runtimes avg: 17.921707342960143, min 15.899797439575195, 
max 20.06899070739746, med 17.800525426864624
    On  client task runtimes avg: 21.518317450417413, min 9.616166114807129, 
max 56.135162115097046, med 18.425913333892822
    Change: 20.068456864239902 %
    Off cpu_stime avg: 16407.92, min 11806, max 22693, med 13530
    On  cpu_stime avg: 11225.3, min 6749, max 16991, med 9553.0
    Change: -31.586087694235466 %
    Off cpu_utime avg: 79608.28, min 74292, max 90022, med 78988
    On  cpu_utime avg: 81990.7, min 66423, max 100659, med 79813.5
    Change: 2.9926786510146908 %
    Off no_op_heartbeat_count avg: 38046.6, min 35760, max 40876, med 37660
    On  no_op_heartbeat_count avg: 38645.45, min 35904, max 41069, med 38437.0
    Change: 1.5739908428085592 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3864.05, min 3588, max 4107, med 3843.5
    Change: n/a %
    Off rpcs_call_count avg: 106745.32, min 102366, max 116243, med 105547
    On  rpcs_call_count avg: 25628.9, min 16396, max 35509, med 26500.0
    Change: -75.99061017382309 %
    Off utime+stime avg: 96016.2, min 88008, max 104646, med 95372
    On  utime+stime avg: 93216.0, min 73172, max 111354, med 93585.0
    Change: -2.916382860392308 %
    Off stat_runtime avg: 94365864015.33333, min 89240010631, max 101257555324, 
med 94649426293.5
    On  stat_runtime avg: 98003354298.125, min 79640161827, max 118015688815, 
med 96154973604.0
    Change: 3.8546674909908374 %
    Off switch avg: 595410.3333333334, min 579012, max 632898, med 592973.5
    On  switch avg: 423931.625, min 364130, max 487969, med 423498.0
    Change: -28.800089406129448 %
========== write count: 24:==========
Runs with batching off: 25, on: 26
    Off client task runtimes avg: 19.930121208600816, min 12.758413553237915, 
max 23.080135107040405, med 19.86552882194519
    On  client task runtimes avg: 30.194614356451662, min 10.386216878890991, 
max 75.54526543617249, med 25.969217777252197
    Change: 51.50241205468043 %
    Off cpu_stime avg: 16747.2, min 12088, max 23070, med 13862
    On  cpu_stime avg: 12294.653846153846, min 6928, max 20600, med 10795.5
    Change: -26.586809459767334 %
    Off cpu_utime avg: 82548.92, min 77349, max 89319, med 82652
    On  cpu_utime avg: 97315.46153846153, min 60597, max 126110, med 96920.0
    Change: 17.888231049493484 %
    Off no_op_heartbeat_count avg: 38060.0, min 35884, max 40325, med 38114
    On  no_op_heartbeat_count avg: 38877.730769230766, min 36051, max 40675, 
med 39124.0
    Change: 2.1485306600913523 %
    Off heartbeat_batch_count avg: 0.0, min 0, max 0, med 0
    On  heartbeat_batch_count avg: 3886.846153846154, min 3605, max 4068, med 
3911.0
    Change: n/a %
    Off rpcs_call_count avg: 108843.12, min 104785, max 114990, med 108363
    On  rpcs_call_count avg: 25725.153846153848, min 12254, max 39224, med 
28476.5
    Change: -76.3649242633307 %
    Off utime+stime avg: 99296.12, min 89945, max 108275, med 98930
    On  utime+stime avg: 109610.11538461539, min 71324, max 143142, med 107480.0
    Change: 10.38710816154287 %
    Off stat_runtime avg: 98093822746.91667, min 92825678182, max 104345264148, 
med 96791625522.5
    On  stat_runtime avg: 109955212562.0, min 66941529744, max 148793787339, 
med 106608131698.5
    Change: 12.091882529326913 %
    Off switch avg: 624215.5, min 604361, max 645910, med 619230.5
    On  switch avg: 442398.7, min 305941, max 507563, med 452103.0
    Change: -29.12724852234525 %

http://gerrit.cloudera.org:8080/#/c/22867/9/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/22867/9/src/kudu/consensus/consensus_peers.cc@451
PS9, Line 451:   CHECK(request_pending_.load(std::memory_order_relaxed) != 
RequestStatus::NO_ACTIVE);
> Why is it memory_order_relaxed in this CHECK while it's not in others?
We are holding the peer_lock_, so relaxed is enough



--
To view, visit http://gerrit.cloudera.org:8080/22867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538
Gerrit-Change-Number: 22867
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Thu, 17 Jul 2025 14:12:39 +0000
Gerrit-HasComments: Yes

Reply via email to