Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20742 )

Change subject: IMPALA-11501: Add flag to allow catalog-cache operations on 
masked tables
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@39
PS1, Line 39: doesn'
> nit: not always, as getTableIfCached() will return the correct table getTab
Ah, right. Let me rephrase this.


http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@40
PS1, Line 40: ble, ge
> nit: doesn't
Done


http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@41
PS1, Line 41: missi
> nit: not able to
Done


http://gerrit.cloudera.org:8080/#/c/20742/1//COMMIT_MSG@44
PS1, Line 44:  can't
> nit: look up
Done


http://gerrit.cloudera.org:8080/#/c/20742/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/20742/1/be/src/service/frontend.cc@94
PS1, Line 94:     "level catalog-cache operations, i.e. REFRESH/INVALIDATE 
METADATA <table>, from users"
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20742/1/fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java:

http://gerrit.cloudera.org:8080/#/c/20742/1/fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java@127
PS1, Line 127:         tbl = new LocalIncompleteTable(this, tb
> This looks strange to me, as the function's name is getTableIfCached, but w
Yeah, for things that we won't cache in the local catalog of coordinator, we 
still trigger external RPCs to fetch them. I think we need a broader change to 
make the cache complete. There are some TODOs, e.g.
https://github.com/apache/impala/blob/2a8374d7eb17592e6280b15f9adb046a25f2fb85/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L194-L196

The alternative is a good idea! I'll use that instead.


http://gerrit.cloudera.org:8080/#/c/20742/1/fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java@144
PS1, Line 144:         tbl = LocalTable.load(this, tblName);
             :       } catch (TableLoadingException tle) {
             :         // If the table fails to load (eg a Kudu table that 
doesn't have
             :         // a backing table, or some other catalogd-side issue), 
turn it into
             :         // an IncompleteTable. This allows st
> This will load the table twice if loading it fails for some reason. Not rea
Resolved by using the alternative solution.


http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1116
PS1, Line 1116:       # getTableIfCached() in LocalCatalog loads a minimal 
incomplete table
              :       # that does not include the ownership information. Hence 
show tables
              :       # *never* show owned tables. TODO(bharathv): Fix in a 
follow up commit
              :       pytest.xfail("getTableIfCached() faulty behavior, known 
issue")
> Doesn't your change fix this too?
Not yet, getTableIfCached() still returns a LocalIncompleteTable if coordinator 
hasn't cached the table meta (TableMetaRefImpl).


http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1562
PS1, Line 1562:     catalogd_args=LOCAL_CATALOG_CATALOGD_ARGS)
              :   def test_allow_metadata_update_local_catalog(self, 
unique_name):
> Wouldn't it be better to run all tests with local catalog and add special h
Yeah, we should turn on local catalog by default and run all tests on it. 
Currently, we need to make sure the flag works in both catalog modes.


http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1571
PS1, Line 1571: non_admin_client
> Should this be renamed to 'non_admin_client' or 'owner_client'? It seems in
Yeah, 'non_admin_client' is better.


http://gerrit.cloudera.org:8080/#/c/20742/1/tests/authorization/test_ranger.py@1608
PS1, Line 1608:       self.execute_query_expect_success(
              :           non_admin_client,
              :           "refresh functional.alltypestiny partition(year=2009, 
month=1)", user=user)
Nice findings!

REFRESH on partitions always triggers metadata loading regardless of what 
catalog mode is used:
https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L144-L146
This is expected since we need metadata to analyze the partition columns. The 
ownership info is loaded by the way.

The difference on REFRESH on tables is due to this getTable() invocation:
https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L3847
It's called from
https://github.com/apache/impala/blob/4c762725c707f8d150fe250c03faf486008702d4/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java#L165
In local catalog mode, getTable() will trigger metadata loading so the 
ownership info is loaded. I think we should use getTableIfCached() instead, 
which matches the comments in ResetMetadataStmt#analyze():

> Don't call getTable() to avoid loading the table metadata if it is not yet in 
> this impalad's catalog cache.

However, dbContainsTable() is used in several places. To limit the impact of 
this patch, it'd be better to revisit it in another JIRA. I filed IMPALA-12591 
for this.

In the current patch, I think we should add a warning when the ownership is 
unknown (unloaded).

For the 2nd difference, it's due to whether the table is loaded in coordinator 
side. In the legacy catalog mode, coordinator mirros the catalog cache from 
catalogd. Once the table is loaded in catalogd, it's also loaded in the 
coordinator side. That's the state when the REFRESH finished.
However, in local catalog mode, coordinator maintains its own cache. When the 
REFRESH finishes in catalogd, coordinator invalidates the local cache of the 
table meta. It will be fetched next time it's needed. In this case, the 
ownership info is missing when analyzing the following INVALIDATE METADATA 
command. Note that we are calling getTableIfCached() for INVALIDATE METADATA. 
It won't load the table from catalogd.



--
To view, visit http://gerrit.cloudera.org:8080/20742
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45935654cbf05a55d740f1b04781022c271f7678
Gerrit-Change-Number: 20742
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Mon, 04 Dec 2023 08:11:43 +0000
Gerrit-HasComments: Yes

Reply via email to