Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20707 )

Change subject: IMPALA-12561: Event-processor shouldn't go into ERROR state for 
failures in fetching events
......................................................................

IMPALA-12561: Event-processor shouldn't go into ERROR state for failures in 
fetching events

Any failures in fetching HMS events should be retriable. Event-processor
should not go into the ERROR state which can only be recovered by a
global INVALIDATE METADATA command.

This patch deals with the failure in creating a new MetaStoreClient
by throwing a MetastoreClientInstantiationException instead of an
IllegalStateException. Previously the IllegalStateException could fail
the process of fetching HMS events. Now callers can catch the
MetastoreClientInstantiationException and convert it into
MetastoreNotificationFetchException if the process is retriable. So the
event-processor can retry in the next round. There are still other
callers of Catalog#getMetaStoreClient() that don't catch the new
exception since their work can't be easily retried.

Also makes sure MetastoreEventsProcessor.getCurrentEventId() only throws
MetastoreNotificationFetchException. Previously it throws
CatalogException which will fail the event-processor. Note that
CatalogException is used for errors in accessing objects in the Catalog,
e.g. table not found. We shouldn't throw it when fetching HMS events
fails.

Tests:
 - Add FE unit test to verify MetastoreNotificationFetchException is
   thrown as expected. To mimic HMS connection failures, use a
   customized MetastoreClientPool that uses wrong HMS port.
 - Add e2e test in custom_cluster/test_catalog_hms_failures.py. The test
   class previously only runs in exhaustive jobs due to long running
   time. Optimize the test to only restart HMS. Adds a new option,
   -if_not_running, for run-hive-server.sh to avoid unneccessary
   restarts.

Change-Id: I775684d473fdbfb9f0531234f59a6239bd0873e3
Reviewed-on: http://gerrit.cloudera.org:8080/20707
Reviewed-by: Riza Suminto <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
A 
fe/src/main/java/org/apache/impala/catalog/MetastoreClientInstantiationException.java
M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
A 
fe/src/test/java/org/apache/impala/testutil/IncompetentMetastoreClientPool.java
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_catalog_hms_failures.py
10 files changed, 246 insertions(+), 47 deletions(-)

Approvals:
  Riza Suminto: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/20707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I775684d473fdbfb9f0531234f59a6239bd0873e3
Gerrit-Change-Number: 20707
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>

Reply via email to