Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19378 )

Change subject: IMPALA:11808: Added support for reload event in catalogD
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@518
PS4, Line 518:     FireEventRequestData data = new FireEventRequestData();
What is the expectation for other HMS users than Impala when receiving this 
event? Will they ignore it because they do not know the new member in the union 
FireEventRequestData? 
https://github.com/apache/hive/blob/da42736483ce110997f924a746c63b4bc8a8d5b7/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L1480


http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6401
PS4, Line 6401:       
MetastoreShim.fireReloadEventHelper(catalog_.getMetaStoreClient(),
Do I understand correctly that we do not check whether the table has actually 
changed during REFRESH and always send an event?

I have some concerns about potential performance impact of the change. If N 
Impala clusters connect to an HMS and all do REFRESH for a table(or INVALIDATE 
METADATA + actually use the table), then all REFRESH/INVALIDATE METADATA 
triggers an event and causes other catalogds to also do the reload, leading to 
each catalogd doing N refresh.

I think that this can be solved for REFRESH by skipping the event generation if 
actually nothing has changed in the table. If there are changes (e.g. new files 
were written to an external table without sending the relevant events to HMS) 
then it useful to notify other Impalas to refresh their metadata snapshots, but 
otherwise the REFRESH was NOOP, it doesn't seem meaningful to send an event.

INVALIDATE METADATA seems like a harder case - we don't know whether there were 
any changes while unnecessarily invalidating + reloading the table on other 
catalogds can be very expensive. I think that it would be actually the best to 
not propagate INVALIDATE METADATA to other cluster, at least by default. A 
query option could be added to set whether to send the event, so someone could 
initiate a "global" INVALIDATE METADATA, but not do it by default.


http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py@255
PS4, Line 255: .
nit: +4 indentation


http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py@258
PS4, Line 258:       self.client.execute("drop table 
{}.{}".format(unique_database, test_reload_table))
there is no need to drop the table, unique_database will be cleaned up by test 
framework



--
To view, visit http://gerrit.cloudera.org:8080/19378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136
Gerrit-Change-Number: 19378
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Mon, 02 Jan 2023 16:01:15 +0000
Gerrit-HasComments: Yes

Reply via email to