Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21922 )
Change subject: IMPALA-13438 Batch the `addHmsPartitions` operations in `alterTableRecoverPartitions`. ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/21922/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21922/2//COMMIT_MSG@20 PS2, Line 20: An analysis of the memory dump using the MemoryAnalyzer revealed that the temporary object : contained a massive number of FieldSchema objects (2000 columns * 50,000 partitions), : which overwhelmed memory resources. > I have reproduced the issue on the latest master branch. Although IMPALA-11 I see. Checked the codes, when event-processor is disabled, the Partition objects come from MetaStoreUtil.addPartitions(): https://github.com/apache/impala/blob/2535e79491078a0353dbeed1a094e91366906149/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5319-L5321 In that method, the FieldSchema list of partitions will be replaced with the reference of the table's. So we shouldn't see lots of FieldSchema objects. https://github.com/apache/impala/blob/2535e79491078a0353dbeed1a094e91366906149/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java#L248 However, when event-processor is enabled, the Partition objects are extracted from HMS events: https://github.com/apache/impala/blob/2535e79491078a0353dbeed1a094e91366906149/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5343 The above partition list is not used as the result. I think we should modify getPartitionsFromEvent() to use the table level list of FieldSchema: https://github.com/apache/impala/blob/2535e79491078a0353dbeed1a094e91366906149/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2479 http://gerrit.cloudera.org:8080/#/c/21922/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21922/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2483 PS2, Line 2483: partitionToEventId.put(part, eventId); As commented above, I think we should trim the Partitions here using MetaStoreUtil.replaceSchemaFromTable() to get rid of lots of FieldSchema objects (they will be GCed). -- To view, visit http://gerrit.cloudera.org:8080/21922 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13aaad8a915f75fbe808bf96b1cf891312b1a592 Gerrit-Change-Number: 21922 Gerrit-PatchSet: 2 Gerrit-Owner: zhangqianqiong <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: zhangqianqiong <[email protected]> Gerrit-Comment-Date: Mon, 14 Oct 2024 06:57:57 +0000 Gerrit-HasComments: Yes
