Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23436 )
Change subject: IMPALA-14447: Parallelize table loading in getMissingTables() ...................................................................... Patch Set 16: (4 comments) http://gerrit.cloudera.org:8080/#/c/23436/16/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java File fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/23436/16/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@368 PS16, Line 368: if (LOG.isTraceEnabled()) { : LOG.debug("Parallel load {}", tbls); : } nit: do you want it to be in TRACE level or DEBUG level? We check isTraceEnabled() but use LOG.debug() here. On the other hand, we don't need the check if constructing the parameters is not expensive. Here we pass 'tbls' directly so it's not expensive. nit: can we also log 'maxThreads'? http://gerrit.cloudera.org:8080/#/c/23436/16/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@409 PS16, Line 409: ForkJoinPool nit: "ForkJoinPool" is stale http://gerrit.cloudera.org:8080/#/c/23436/16/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java@436 PS16, Line 436: Math.min(tbls.size(), BackendConfig.INSTANCE.getMaxStmtMetadataLoaderThreads()); We don't need multi-threads here in the legacy catalog mode where the local catalog operations are pretty fast. Creating threads has some overhead so it'd be nice to set maxThreads to 1 in legacy catalog mode. http://gerrit.cloudera.org:8080/#/c/23436/16/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java File fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java: http://gerrit.cloudera.org:8080/#/c/23436/16/fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java@400 PS16, Line 400: try (FrontendProfile.Scope scope = FrontendProfile.createNewWithScope()) { : profile = FrontendProfile.getCurrent(); : assertNotNull(profile); : tables.parallelStream().forEach(t -> { nit: It's low priority but can we use the same code in StmtMetadataLoader? Or do we have other tests that check the profile for a given list of tables? -- To view, visit http://gerrit.cloudera.org:8080/23436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97a5165844ae846b28338d62e93a20121488d79f Gerrit-Change-Number: 23436 Gerrit-PatchSet: 16 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Thu, 09 Oct 2025 14:10:55 +0000 Gerrit-HasComments: Yes
