Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20409


Change subject: KUDU-3500 don't start operations timed out in prepare queue
......................................................................

KUDU-3500 don't start operations timed out in prepare queue

While troubleshooting a performance issue where the prepare queue for
a tablet was very long, I noticed that tablet servers start write
operations that correspond to RPCs that have already timed out.  Most
likely, the client that sent the RPC had already detected the timeout
and expected that the write would have failed already.

As a simple optimization, this patch updates the logic of the OpDriver
class to respond with TimedOut error status right away when a write
operation that has already timed out while waiting in the prepare queue
is dispatched to the prepare thread.  That helps with clearing the queue
and processing not-yet-timed-out requests from the queue faster,
increasing the overall robustness of a tablet server when the load
is high and the node's CPU and disk IO bandwidth are saturated.

A new tablet metric 'ops_timed_out_in_prepare_queue' is introduced to
track the number of WriteRequestPB RPCs timed out in the tablet's prepare
queue and responded with TimedOut error status even before starting
the PREPARE phase for the corresponding operation.

This patch also adds a new test to cover the new functionality.

Change-Id: I202ce6b5e425439b50c0751d7f7406e69b8e751a
Reviewed-on: http://gerrit.cloudera.org:8080/20300
Tested-by: Kudu Jenkins
Reviewed-by: Abhishek Chennaka <[email protected]>
(cherry picked from commit 6c049687f60e90cbdac6f6ec039a528d13664a6b)
---
M src/kudu/tablet/ops/op_driver.cc
M src/kudu/tablet/ops/op_driver.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_service.cc
11 files changed, 170 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/20409/1
--
To view, visit http://gerrit.cloudera.org:8080/20409
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.17.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I202ce6b5e425439b50c0751d7f7406e69b8e751a
Gerrit-Change-Number: 20409
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <[email protected]>

Reply via email to