Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22867
to look at the new patch set (#3).
Change subject: Add option to send no-op heartbeat operations batched PART1
......................................................................
Add option to send no-op heartbeat operations batched PART1
Due to the periodically sent heartbeat messages, a Kudu cluster having
thousands of tablets still uses significant CPU and networking even
without any user activity.
When we send multiple messages to the same host within a short
time-frame, they can be batched to reduce CPU impact.
This would result in fewer RPC calls and some fields can also be shared
between the no-op messages.
Measurement:
I launched an AWS cluster with a 1 master, 4 tserver configuration
(t3.xlarge). Then put 2000 tablets (RF=3) on them. cpu_stime seems to
be decreased by 10-15% (the same number of no-op messages arrive).
Here is one result over 500 sec for the flag being turned on or off
(change in metrics over 500 sec):
Metric: no_op_heartbeat_count, off: 920739, on: 919910, inc: -0.0900%
Metric: heartbeat_batch_count, off: 0, on: 30664, inc: n/a%
Metric: cpu_stime, off: 81208, on: 64079, inc: -21.093%
Metric: cpu_utime, off: 176558, on: 170430, inc: -3.471%
Next 2 PARTS that are needed:
1) Process response in multiple threads:
If we start to write multiple tablets in the same time that are in the
same buffer, then after the flush, when their responses arrive the new
hearthbeats (with operations) will be sent out on the same thread
(unbatched, so their responses will be multi threaded, and kudu will
return back to normal).
This would rarely cause problems on a usual cluster. However if you
have a 3 tserver setup with a sigle table having 30 tablets with hash
partitions, it can add multiple seconds of delay to the write
operation (but not increase the overall cpu consumption): If all 30
hearthbeat is waiting in the buffer, one of the write will flush it.
When the response arrives back, we will process it in a single thread.
We will send out 30 updates with actual operations on this single
thread.
Possible solution:
+ Keep track if we are called in batch mode and if there was already
1-2 send "send_more_immediately" case, then request a callback instead
of sending the message.
2) If a write requests finds a no-op message still in the
buffer, it should discard it, not flush a buffer. It would make the
problem in 1) appear much less frequently (and stabilize the unit
tests, that are now flaky with enable_multi_raft_heartbeat_batcher=1)
Next 2 parts that are needed:
1) Process response in multiple threads:
If we start to write multiple tablets at the same time that are in the
same buffer, then after the flush, when their responses arrive, the new
heartbeats (with operations) will be sent out on the same thread
(unbatched, so their responses will be multi-threaded, and Kudu will
return back to normal).
This would rarely cause problems on a usual cluster. However, if you
have a 3 tserver setup with a single table having 30 tablets with hash
partitions, it can add multiple seconds of delay to the write
operation (but not increase the overall CPU consumption): If all 30
heartbeats are waiting in the buffer, one of the writes will flush it.
When the response arrives back, we will process it in a single thread.
We will send out 30 updates with actual operations on this single
thread.
Possible solution:
+ Keep track if we are called in batch mode and if there was already
1-2 "send_more_immediately" cases, then request a callback instead
of sending the message immediately.
2) If a write request finds a no-op message still in the
buffer, it should discard it, not flush the buffer. It would make the
problem in 1) appear much less frequently (and stabilize the unit
tests that are now flaky with enable_multi_raft_heartbeat_batcher=1),
so this should be done after 1) is implemented (so we do not hide it).
Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538
---
M src/kudu/client/client-test.cc
M src/kudu/consensus/CMakeLists.txt
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
A src/kudu/consensus/multi_raft_batcher.cc
A src/kudu/consensus/multi_raft_batcher.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/sys_catalog.h
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
23 files changed, 765 insertions(+), 36 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/22867/3
--
To view, visit http://gerrit.cloudera.org:8080/22867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie92ba4de5eae00d56cd513cb644dce8fb6e14538
Gerrit-Change-Number: 22867
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)