Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20300 )
Change subject: KUDU-3500 don't start operations timed out in prepare queue ...................................................................... KUDU-3500 don't start operations timed out in prepare queue While troubleshooting a performance issue where the prepare queue for a tablet was very long, I noticed that tablet servers start write operations that correspond to RPCs that have already timed out. Most likely, the client that sent the RPC had already detected the timeout and expected that the write would have failed already. As a simple optimization, this patch updates the logic of the OpDriver class to respond with TimedOut error status right away when a write operation that has already timed out while waiting in the prepare queue is dispatched to the prepare thread. That helps with clearing the queue and processing not-yet-timed-out requests from the queue faster, increasing the overall robustness of a tablet server when the load is high and the node's CPU and disk IO bandwidth are saturated. A new tablet metric 'ops_timed_out_in_prepare_queue' is introduced to track the number of WriteRequestPB RPCs timed out in the tablet's prepare queue and responded with TimedOut error status even before starting the PREPARE phase for the corresponding operation. This patch also adds a new test to cover the new functionality. Change-Id: I202ce6b5e425439b50c0751d7f7406e69b8e751a Reviewed-on: http://gerrit.cloudera.org:8080/20300 Tested-by: Kudu Jenkins Reviewed-by: Abhishek Chennaka <[email protected]> --- M src/kudu/tablet/ops/op_driver.cc M src/kudu/tablet/ops/op_driver.h M src/kudu/tablet/ops/write_op.cc M src/kudu/tablet/tablet_metrics.cc M src/kudu/tablet/tablet_metrics.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/txn_participant-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc 11 files changed, 170 insertions(+), 21 deletions(-) Approvals: Kudu Jenkins: Verified Abhishek Chennaka: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/20300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I202ce6b5e425439b50c0751d7f7406e69b8e751a Gerrit-Change-Number: 20300 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <[email protected]>
