Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )
Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events ...................................................................... Patch Set 4: (5 comments) Sorry to be late here but I still have some concerns on the solution. http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@301 PS4, Line 301: private AtomicReference<Map<String, Map<String,Integer>>> refreshMetadataMap = After checking more scenarios, I realized this map could lead to mem leak if items are not removed correctly. We should carefully maintain their life cycles to match their corresponding catalog objects. E.g. when a partition or table is dropped, we need to remove the corresponding item. For a DROP DB CASCADE statement, we need to remove items of all the underlying tables. For RENAME operations on a table or db, we should also update the map correspondingly. These are not handled by the current patch. There might be more cases that we need to review. Instead of maintaining a map like this, another solution is adding a "lastRefreshTime" field in the Table and HdfsPartition objects. The advantage is we don't need to worry about life cycles of the map items. The required changes might be similar to this patch but we don't need to add codes about removing map items when the corresponding catalog objects are dropped/renamed. What do you think? http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2692 PS4, Line 2692: return result.toTCatalogObject(); We should remove the map item before returning here. http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2702 PS4, Line 2702: return null; Is it possible if the db is not in the catalog cache but somehow we have corresponding items in the refreshMetadataMap? http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3882 PS4, Line 3882: return null; If the partitionName doesn't exist in the keys, it's also possible that there was a table level refresh so the map was cleared and only has the key of DEFAULT_PARTITION_NAME. In this case we should check DEFAULT_PARTITION_NAME as well. http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java: http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2534 PS4, Line 2534: eventTime nit: "lastRefreshTime" might be more clear -- To view, visit http://gerrit.cloudera.org:8080/19484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574 Gerrit-Change-Number: 19484 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Sun, 05 Mar 2023 13:07:39 +0000 Gerrit-HasComments: Yes
