Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20853

to look at the new patch set (#2).

Change subject: IMPALA-12670: getIfPresent should throw the cause of error
......................................................................

IMPALA-12670: getIfPresent should throw the cause of error

CatalogdMetaProvider maintains a map (a Guava cache) as its local
catalog cache. It has a piggyback mechanism to load metadata from
catalogd that when concurrent threads want to load the same content
(identified by the same key) from catalogd, only one of them actually
sends the request and load the result into the cache. Other threads wait
and get the result when the work is done.

The piggyback mechanism is implemented by putting a Future object as the
value when the key doesn't exist in the cache. The Future object handles
the loading. Other threads that want the same value just invoke
Future.get() to wait. See more in the comments in loadWithCaching().

If there are any errors thrown in the loading process, Future.get() will
encapsulate the error into an ExecutionException and throw it instead.
The cause could be an InconsistentMetadataFetchException which indicates
FE should retry the planning. It's handled in Frontend#getTExecRequest().

In loadWithCaching(), we try to throw the cause of the exception thrown
from Future.get(). So the InconsistentMetadataFetchException can be
handled as expected. However, in getIfPresent(), the error handling is
inconsistent that it try to throw the current exception. That causes
retriable failures can't be retried. Note that this is an existing bug
but got more easy to be hitted after IMPALA-11501 because getIfPresent()
is now used in LocalDb#getTableIfCached() which is used in many places.

This patch fixes getIfPresent() to have the same logic of using the
Future object (including error handling) as loadWithCaching(). Also
adds more loggings in both catalogd and impalad sides when the lookup
status is abnormal.

In order to test the loading error more easily, this patch adds a hidden
flag, inject_failure_ratio_in_catalog_fetch, to randomly inject
retriable errors.

Tests
 - Ran test_local_catalog_ddls_with_invalidate_metadata 700 times.
 - Add e2e test that will easily fail without this fix.

Change-Id: I74268ba2bb700988107780e13ffbdbb4c767d09d
---
M be/src/exec/catalog-op-executor.cc
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M tests/custom_cluster/test_local_catalog.py
4 files changed, 61 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/20853/2
--
To view, visit http://gerrit.cloudera.org:8080/20853
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I74268ba2bb700988107780e13ffbdbb4c767d09d
Gerrit-Change-Number: 20853
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to