Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20853
Change subject: IMPALA-12670: getIfPresent should throw the cause of error ...................................................................... IMPALA-12670: getIfPresent should throw the cause of error CatalogdMetaProvider maintains a map (a Guava cache) as its local catalog cache. It has a piggyback mechanism to load metadata from catalogd that when concurrent threads want to load the same content (identified by the same key) from catalogd, only one of them actually sends the request and load the result into the cache. Other threads wait and get the result when the work is done. The piggyback mechanism is implemented by putting a Future object as the value when the key doesn't exist in the cache. The Future object handles the loading. Other threads that want the same value just invoke Future.get() to wait. See more in the comments in loadWithCaching(). If there are any errors thrown in the loading process, Future.get() will encapsulate the error into an ExecutionException and throw it instead. The cause could be an InconsistentMetadataFetchException which indicates FE should retry the planning. It's handled in Frontend#getTExecRequest(). In loadWithCaching(), we try to throw the cause of the exception thrown from Future.get(). So the InconsistentMetadataFetchException can be handled as expected. However, in getIfPresent(), the error handling is inconsistent that it try to throw the current exception. That causes retriable failures can't be retried. Note that this is an existing bug but got more easy to be hitted after IMPALA-11501 because getIfPresent() is now used in LocalDb#getTableIfCached() which is used in many places. This patch fixes getIfPresent() to have the same logic of using the Future object (including error handling) as loadWithCaching(). Also adds more loggings in both catalogd and impalad sides when the lookup status is abnormal. In order to test the loading error more easily, this patch adds a hidden flag, inject_failure_ratio_in_catalog_fetch, to randomly inject retriable errors. Tests - Ran test_local_catalog_ddls_with_invalidate_metadata 700 times. - Add e2e test that will easily fail without this fix. Change-Id: I74268ba2bb700988107780e13ffbdbb4c767d09d --- M be/src/exec/catalog-op-executor.cc M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M tests/custom_cluster/test_local_catalog.py 4 files changed, 60 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/20853/1 -- To view, visit http://gerrit.cloudera.org:8080/20853 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I74268ba2bb700988107780e13ffbdbb4c767d09d Gerrit-Change-Number: 20853 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang <[email protected]>
