Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/20742 )
Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on masked tables ...................................................................... Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@39 PS1, Line 39: doesn' > nit: not always, as getTableIfCached() will return the correct table getTab Ah, right. Let me rephrase this. http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@40 PS1, Line 40: ble, ge > nit: doesn't Done http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@41 PS1, Line 41: missi > nit: not able to Done http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@44 PS1, Line 44: can't > nit: look up Done http://gerrit.cloudera.org:8080/#/c/20742/1/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/20742/1/be/src/service/frontend.cc@94 PS1, Line 94: "level catalog-cache operations, i.e. REFRESH/INVALIDATE METADATA <table>, from users" > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/20742/1/fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java File fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java: http://gerrit.cloudera.org:8080/#/c/20742/1/fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java@127 PS1, Line 127: tbl = new LocalIncompleteTable(this, tb > This looks strange to me, as the function's name is getTableIfCached, but w Yeah, for things that we won't cache in the local catalog of coordinator, we still trigger external RPCs to fetch them. I think we need a broader change to make the cache complete. There are some TODOs, e.g. https://github.com/apache/impala/blob/2a8374d7eb17592e6280b15f9adb046a25f2fb85/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L194-L196 The alternative is a good idea! I'll use that instead. http://gerrit.cloudera.org:8080/#/c/20742/1/fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java@144 PS1, Line 144: tbl = LocalTable.load(this, tblName); : } catch (TableLoadingException tle) { : // If the table fails to load (eg a Kudu table that doesn't have : // a backing table, or some other catalogd-side issue), turn it into : // an IncompleteTable. This allows st > This will load the table twice if loading it fails for some reason. Not rea Resolved by using the alternative solution. http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py File tests/authorization/test_ranger.py: http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1116 PS1, Line 1116: # getTableIfCached() in LocalCatalog loads a minimal incomplete table : # that does not include the ownership information. Hence show tables : # *never* show owned tables. TODO(bharathv): Fix in a follow up commit : pytest.xfail("getTableIfCached() faulty behavior, known issue") > Doesn't your change fix this too? Not yet, getTableIfCached() still returns a LocalIncompleteTable if coordinator hasn't cached the table meta (TableMetaRefImpl). http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1562 PS1, Line 1562: catalogd_args=LOCAL_CATALOG_CATALOGD_ARGS) : def test_allow_metadata_update_local_catalog(self, unique_name): > Wouldn't it be better to run all tests with local catalog and add special h Yeah, we should turn on local catalog by default and run all tests on it. Currently, we need to make sure the flag works in both catalog modes. http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1571 PS1, Line 1571: non_admin_client > Should this be renamed to 'non_admin_client' or 'owner_client'? It seems in Yeah, 'non_admin_client' is better. http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1608 PS1, Line 1608: self.execute_query_expect_success( : non_admin_client, : "refresh functional.alltypestiny partition(year=2009, month=1)", user=user) Nice findings! REFRESH on partitions always triggers metadata loading regardless of what catalog mode is used: https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L144-L146 This is expected since we need metadata to analyze the partition columns. The ownership info is loaded by the way. The difference on REFRESH on tables is due to this getTable() invocation: https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L3847 It's called from https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L165 In local catalog mode, getTable() will trigger metadata loading so the ownership info is loaded. I think we should use getTableIfCached() instead, which matches the comments in ResetMetadataStmt#analyze(): > Don't call getTable() to avoid loading the table metadata if it is not yet in > this impalad's catalog cache. However, dbContainsTable() is used in several places. To limit the impact of this patch, it'd be better to revisit it in another JIRA. I filed IMPALA-12591 for this. In the current patch, I think we should add a warning when the ownership is unknown (unloaded). For the 2nd difference, it's due to whether the table is loaded in coordinator side. In the legacy catalog mode, coordinator mirros the catalog cache from catalogd. Once the table is loaded in catalogd, it's also loaded in the coordinator side. That's the state when the REFRESH finished. However, in local catalog mode, coordinator maintains its own cache. When the REFRESH finishes in catalogd, coordinator invalidates the local cache of the table meta. It will be fetched next time it's needed. In this case, the ownership info is missing when analyzing the following INVALIDATE METADATA command. Note that we are calling getTableIfCached() for INVALIDATE METADATA. It won't load the table from catalogd. -- To view, visit http://gerrit.cloudera.org:8080/20742 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678 Gerrit-Change-Number: 20742 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 04 Dec 2023 08:11:43 +0000 Gerrit-HasComments: Yes
