Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21146 )

Change subject: IMPALA-12699: Set timeout for catalog RPCs
......................................................................

IMPALA-12699: Set timeout for catalog RPCs

We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator
side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching
metadata in local-catalog mode (see comments in
CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared
metadata (e.g. db/table list) could block other queries on the same
coordinator.

Such lightweight requests don't need to acquire table lock or trigger
table loading in catalogd. The causes of the hanging are usually
network issues, e.g. TCP connection become half open due to TCP
retransmissions timed out. A retry on the RPC helps to recover from such
failures. Currently, the timeout for catalog RPC is set to 0 by default.
This prevent the retry and let the client to wait infinitely.

This patch distinguishes the lightweight catalog RPCs and uses a
dedicated catalogd client cache for them. They use a timeout of 30 mins
which is longer enough to tolerate TCP retransmission timeouts.
Also sets a timeout of 10 hours for other catalog RPCs. Operations take
longer than that are usually abnormal and hanging.

Tests
 - Add e2e test to verify the lightweight RPC client cache is used.
 - Adjust TestRestart.test_catalog_connection_retries to use local
   catalog mode since in the legacy catalog mode, coordinator only sends
   PrioritizeLoad requests which are lightweight RPCs.

This is a continuation of patch by Wenzhe Zhou <[email protected]>

Change-Id: Iad39a79d0c89f2b04380f610a7e60558429e9c6e
Reviewed-on: http://gerrit.cloudera.org:8080/21146
Reviewed-by: Wenzhe Zhou <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/exec/catalog-op-executor.cc
M be/src/runtime/client-cache.cc
M be/src/runtime/client-cache.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M common/thrift/metrics.json
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_restart_services.py
8 files changed, 83 insertions(+), 11 deletions(-)

Approvals:
  Wenzhe Zhou: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/21146
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iad39a79d0c89f2b04380f610a7e60558429e9c6e
Gerrit-Change-Number: 21146
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>

Reply via email to