Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14307
Change subject: IMPALA-7506: support global INVALIDATE METADATA in local catalog mode ...................................................................... IMPALA-7506: support global INVALIDATE METADATA in local catalog mode In local catalog mode, the coordinator does not cache all the metadata. Instead, it caches them on-demand (based on query requests), and removes them based on the Guava cache configurations (e.g. size or TTL). We use the catalog version as part of the cache key for fine-grained metadata, e.g partition meta. When invalidating a table, we simply invalidate the top-level table entry, and allow other information to remain in the cache. The old metadata will be lazily removed by Guava cache since they won't be touched anymore. Thus, there're bunch of stale metadata in the cache so we can't track the minimal catalog version of valid catalog objects efficiently. The minimal catalog version of valid catalog objects is used to implement global invalidate metadata. In legacy catalog mode, all cached catalog objects are valid in fact. Coordinator gets the expected min catalog version in the RPC response from Catalogd. It's the version when Catalogd starts to reset the entire catalog, which means when the reset is done, all valid catalog objects should be associated with a catalog version larger than it. Coordinator will wait until its min catalog version exceeds this value, which means it has processed all the updates of the reset propagated from the catalogd via statestored. If SYNC_DDL is set, the coordinator will also wait until other coordinators reach the same catalog version with it, so they can also see the latest update of reset. This patch adds a new field (lastResetCatalogVersion) in TCatalog to keep the catalog version when catalogd starts to reset the entire metadata. Each time when catalogd generates a new topic update for catalog topic, it will generate a TCatalogObject in CATALOG type containing the state of the catalog which includes this new field. When coordinator receives a new value of lastResetCatalogVersion in a topic update, it means catalogd has reset the entire catalog and all the relative updates are also included in the same topic update. This is guaranteed by the fact that the write lock of versionLock is held when catalogd resetting the entire catalog. So the update thread which requires holding the read lock of versionLock don't have chance to propagate partial results. Thus, all metadata with catalog version <= lastResetCatalogVersion can be considered stale after coordinator finish processing the topic update. lastResetCatalogVersion + 1 is the lower bound (included) of min catalog version of a coordinator. Tests: - Recover all existing tests that have been disabled due to this missing feature Change-Id: Ib61a7ab1ffa062620ffbc2dadc34bd7a8ca9e549 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M tests/authorization/test_ranger.py M tests/common/skip.py M tests/custom_cluster/test_local_catalog.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py 9 files changed, 52 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/14307/1 -- To view, visit http://gerrit.cloudera.org:8080/14307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib61a7ab1ffa062620ffbc2dadc34bd7a8ca9e549 Gerrit-Change-Number: 14307 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang <[email protected]>
