Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19378 )
Change subject: IMPALA:11808: Added support for reload event in catalogD ...................................................................... Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@518 PS4, Line 518: FireEventRequestData data = new FireEventRequestData(); What is the expectation for other HMS users than Impala when receiving this event? Will they ignore it because they do not know the new member in the union FireEventRequestData? https://github.com/apache/hive/blob/da42736483ce110997f924a746c63b4bc8a8d5b7/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L1480 http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6401 PS4, Line 6401: MetastoreShim.fireReloadEventHelper(catalog_.getMetaStoreClient(), Do I understand correctly that we do not check whether the table has actually changed during REFRESH and always send an event? I have some concerns about potential performance impact of the change. If N Impala clusters connect to an HMS and all do REFRESH for a table(or INVALIDATE METADATA + actually use the table), then all REFRESH/INVALIDATE METADATA triggers an event and causes other catalogds to also do the reload, leading to each catalogd doing N refresh. I think that this can be solved for REFRESH by skipping the event generation if actually nothing has changed in the table. If there are changes (e.g. new files were written to an external table without sending the relevant events to HMS) then it useful to notify other Impalas to refresh their metadata snapshots, but otherwise the REFRESH was NOOP, it doesn't seem meaningful to send an event. INVALIDATE METADATA seems like a harder case - we don't know whether there were any changes while unnecessarily invalidating + reloading the table on other catalogds can be very expensive. I think that it would be actually the best to not propagate INVALIDATE METADATA to other cluster, at least by default. A query option could be added to set whether to send the event, so someone could initiate a "global" INVALIDATE METADATA, but not do it by default. http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py File tests/custom_cluster/test_events_custom_configs.py: http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py@255 PS4, Line 255: . nit: +4 indentation http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py@258 PS4, Line 258: self.client.execute("drop table {}.{}".format(unique_database, test_reload_table)) there is no need to drop the table, unique_database will be cleaned up by test framework -- To view, visit http://gerrit.cloudera.org:8080/19378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136 Gerrit-Change-Number: 19378 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Mon, 02 Jan 2023 16:01:15 +0000 Gerrit-HasComments: Yes
