Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20707 )
Change subject: IMPALA-12561: Event-processor shouldn't go into ERROR state for failures in fetching events ...................................................................... IMPALA-12561: Event-processor shouldn't go into ERROR state for failures in fetching events Any failures in fetching HMS events should be retriable. Event-processor should not go into the ERROR state which can only be recovered by a global INVALIDATE METADATA command. This patch deals with the failure in creating a new MetaStoreClient by throwing a MetastoreClientInstantiationException instead of an IllegalStateException. Previously the IllegalStateException could fail the process of fetching HMS events. Now callers can catch the MetastoreClientInstantiationException and convert it into MetastoreNotificationFetchException if the process is retriable. So the event-processor can retry in the next round. There are still other callers of Catalog#getMetaStoreClient() that don't catch the new exception since their work can't be easily retried. Also makes sure MetastoreEventsProcessor.getCurrentEventId() only throws MetastoreNotificationFetchException. Previously it throws CatalogException which will fail the event-processor. Note that CatalogException is used for errors in accessing objects in the Catalog, e.g. table not found. We shouldn't throw it when fetching HMS events fails. Tests: - Add FE unit test to verify MetastoreNotificationFetchException is thrown as expected. To mimic HMS connection failures, use a customized MetastoreClientPool that uses wrong HMS port. - Add e2e test in custom_cluster/test_catalog_hms_failures.py. The test class previously only runs in exhaustive jobs due to long running time. Optimize the test to only restart HMS. Adds a new option, -if_not_running, for run-hive-server.sh to avoid unneccessary restarts. Change-Id: I775684d473fdbfb9f0531234f59a6239bd0873e3 Reviewed-on: http://gerrit.cloudera.org:8080/20707 Reviewed-by: Riza Suminto <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java A fe/src/main/java/org/apache/impala/catalog/MetastoreClientInstantiationException.java M fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A fe/src/test/java/org/apache/impala/testutil/IncompetentMetastoreClientPool.java M testdata/bin/run-hive-server.sh M tests/custom_cluster/test_catalog_hms_failures.py 10 files changed, 246 insertions(+), 47 deletions(-) Approvals: Riza Suminto: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/20707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I775684d473fdbfb9f0531234f59a6239bd0873e3 Gerrit-Change-Number: 20707 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
