Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18439 )

Change subject: IMPALA-11263: Coordinator hang when cancelling a query
......................................................................

IMPALA-11263: Coordinator hang when cancelling a query

In a rare case, callback Coordinator::BackendState::ExecCompleteCb()
is not called for the corresponding ExecQueryFInstances RPC when the
RPC is cancelled. This causes coordinator to wait indefinitely when
calling Coordinator::BackendState::Cancel() to cancel a fragment
instance.

This patch adds timeout for BackendState::WaitOnExecLocked() so that
coordinator will not be blocked indefinitely when cancelling a query.

Testing:
 - Added a test case to simulate the callback missing when a query
   is failed. Verified that the coordinator would hang without the
   fixing, and would not hang with the fixing.
 - Passed exhaustive-debug tests.

Change-Id: I915511afe2df3017cbbf37f6aff3c5ff7f5473be
Reviewed-on: http://gerrit.cloudera.org:8080/18439
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M tests/custom_cluster/test_rpc_timeout.py
3 files changed, 155 insertions(+), 97 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/18439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I915511afe2df3017cbbf37f6aff3c5ff7f5473be
Gerrit-Change-Number: 18439
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>

Reply via email to