Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing 
by skipping unnecessary events
......................................................................


Patch Set 4:

(5 comments)

Sorry to be late here but I still have some concerns on the solution.

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@301
PS4, Line 301:   private AtomicReference<Map<String, Map<String,Integer>>> 
refreshMetadataMap =
After checking more scenarios, I realized this map could lead to mem leak if 
items are not removed correctly. We should carefully maintain their life cycles 
to match their corresponding catalog objects. E.g. when a partition or table is 
dropped, we need to remove the corresponding item. For a DROP DB CASCADE 
statement, we need to remove items of all the underlying tables. For RENAME 
operations on a table or db, we should also update the map correspondingly. 
These are not handled by the current patch. There might be more cases that we 
need to review.

Instead of maintaining a map like this, another solution is adding a 
"lastRefreshTime" field in the Table and HdfsPartition objects. The advantage 
is we don't need to worry about life cycles of the map items. The required 
changes might be similar to this patch but we don't need to add codes about 
removing map items when the corresponding catalog objects are dropped/renamed.

What do you think?


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2692
PS4, Line 2692:           return result.toTCatalogObject();
We should remove the map item before returning here.


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2702
PS4, Line 2702:         return null;
Is it possible if the db is not in the catalog cache but somehow we have 
corresponding items in the refreshMetadataMap?


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3882
PS4, Line 3882:     return null;
If the partitionName doesn't exist in the keys, it's also possible that there 
was a table level refresh so the map was cleared and only has the key of 
DEFAULT_PARTITION_NAME. In this case we should check DEFAULT_PARTITION_NAME as 
well.


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2534
PS4, Line 2534: eventTime
nit: "lastRefreshTime" might be more clear



--
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Sun, 05 Mar 2023 13:07:39 +0000
Gerrit-HasComments: Yes

Reply via email to