Hello Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16060 to look at the new patch set (#4). Change subject: [tests] add same_tablet_concurrent_writes test ...................................................................... [tests] add same_tablet_concurrent_writes test Added SameTabletConcurrentWritesTest.InsertsOnly test scenario. The scenario exercises concurrent inserts from multiple clients into the same tablet. The purpose of the newly introduced test is to check for lock contention if running multiple write operations on the same tablet concurrently. There is an interaction between threads pushing Raft consensus updates and RPC worker threads serving write requests, and the test pinpoints the contention over the lock primitives used in RaftConsensus. To validate the results reported by the test, I verified that RPC queue overflows happen a bit less often if using the lock-free implementation of RaftConsensus::CheckLeadershipAndBindTerm() with patch posted here: https://gerrit.cloudera.org/#/c/16034/ The rates of successful write operations was the same for both cases, and that's expected since the bottleneck is the WAL (where additional static delays are introduced per each fsync). However, the number of messages from spinlock_profiling.cc like Waited 190 ms on lock 0x237acd4 ... dropped significantly after applying patch 16034 on top. That's a good news to have less contention because the freed CPU resources might be spend on something useful, like handing another RPC request from the queue (which isn't overflown and able to accommodate extra requests). Below are snippets of various measurements done for this new test before and after applying patch from 16034 review item on top. ======================================================================== Without 16034 patch: Performance counter stats for './bin/same_tablet_concurrent_writes-itest': 7449.425640 task-clock # 0.504 CPUs utilized 47,882 context-switches # 0.006 M/sec 3,454 cpu-migrations # 0.464 K/sec 28,592 page-faults # 0.004 M/sec 10,211,586,270 cycles # 1.371 GHz 10,647,306,766 instructions # 1.04 insns per cycle 1,861,229,149 branches # 249.849 M/sec 25,370,590 branch-misses # 1.36% of all branches 14.767762000 seconds time elapsed With 16034 patch: Performance counter stats for './bin/same_tablet_concurrent_writes-itest': 5646.543970 task-clock # 0.394 CPUs utilized 39,194 context-switches # 0.007 M/sec 3,715 cpu-migrations # 0.658 K/sec 30,090 page-faults # 0.005 M/sec 8,543,832,082 cycles # 1.513 GHz 9,301,870,856 instructions # 1.09 insns per cycle 1,590,579,357 branches # 281.691 M/sec 18,563,203 branch-misses # 1.17% of all branches 14.339274728 seconds time elapsed ======================================================================== ------------------------------------------------------------------------ | Without 16034 patch | With 16034 patch ------------------------------------------------------------------------ write RPC request rate | 15.8 req/sec | 16 req/sec RPC queue overflows | 1898 | 50 spinlock_contention_time | 22966310 | 9161557 ------------------------------------------------------------------------ rpc_incoming_queue_time | | | Count: 82 | Count: 82 | Mean: 199704 | Mean: 1037.87 | Percentiles: | Percentiles: | 0% (min) = 35 | 0% (min) = 21 | 25% = 5844 | 25% = 51 | 50% (med) = 196096 | 50% (med) = 67 | 75% = 388352 | 75% = 1608 | 95% = 400640 | 95% = 3334 | 99% = 598016 | 99% = 9960 | 99.9% = 599552 | 99.9% = 10064 | 99.99% = 599552 | 99.99% = 10064 | 100% (max) = 600048 | 100% (max) = 10066 ------------------------------------------------------------------------ op_apply_run_time | | | Count: 79 | Count: 80 | Mean: 99377.1 | Mean: 80796.3 | Percentiles: | Percentiles: | 0% (min) = 429 | 0% (min) = 575 | 25% = 940 | 25% = 828 | 50% (med) = 1336 | 50% (med) = 1064 | 75% = 200704 | 75% = 200704 | 95% = 391168 | 95% = 200704 | 99% = 401408 | 99% = 200704 | 99.9% = 401408 | 99.9% = 399360 | 99.99% = 401408 | 99.99% = 399360 | 100% (max) = 401432 | 100% (max) = 399703 ------------------------------------------------------------------------ handler_latency_kudu_tserver_TabletServerService_Write: | Count: 49 | Count: 45 | Mean: 3.11688e+06 | Mean: 3.08435e+06 | Percentiles: | Percentiles: | 0% (min) = 611494 | 0% (min) = 636928 | 25% = 1802240 | 25% = 2392064 | 50% (med) = 3391488 | 50% (med) = 3178496 | 75% = 4390912 | 75% = 3997696 | 95% = 4816896 | 95% = 4587520 | 99% = 5013504 | 99% = 4587520 | 99.9% = 5013504 | 99.9% = 4587520 | 99.99% = 5013504 | 99.99% = 4587520 | 100% (max) = 5023673 | 100% (max) = 4616174 ------------------------------------------------------------------------ Change-Id: I7eef6e46e7685450354473cee9d804c5054723eb --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/same_tablet_concurrent_writes-itest.cc 2 files changed, 341 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/16060/4 -- To view, visit http://gerrit.cloudera.org:8080/16060 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7eef6e46e7685450354473cee9d804c5054723eb Gerrit-Change-Number: 16060 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon <t...@apache.org>