Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )
Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases' ...................................................................... Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/14846/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14846/1//COMMIT_MSG@14 PS1, Line 14: This patch enables MT_DOP query option to accelerate 'show tables/databases' > Let's not overload mt_dop. mt_dop should only be about query execution, not +1 to this comment above. Lets not use mt_dop for this. http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@a778 PS1, Line 778: : : Looks like this check is being removed in the code. Is that intentional? http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@799 PS1, Line 799: poolSize = mtDop > 0 ? mtDop Instead of using mt_dop to get the poolsize, I think it would be good to have a separate config to determine the pool size. Also, since getDbs and getTableNames calls are so frequent (I have seen systems where you can have 10-20 of such calls in parallel at a time), may be it would be good to use a single static thread pool which can accept such tasks. Otherwise, in the the example above, we may have 10*10 threads doing the checkAccess calls if the pool size is 10. http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@977 PS1, Line 977: : // Wait for the db checking tasks to be finished. : int failedCheckTasks = 0; : int index = 0; : iter = dbs.iterator(); : while (iter.hasNext()) { : iter.next(); : try { : if (!pendingCheckingTasks.get(index).get()) : iter.remove(); : index ++; : } catch (ExecutionException | InterruptedException e) { : failedCheckTasks ++; : LOG.error("Encountered an error checking access for dbs", e); : } : } It looks like a lot of this code is duplicated. Can we refactor this into a method which can be called from both getTableNames and getDbs? http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1032 PS1, Line 1032: hrow new IllegalArgumentException("Params dbName should not be null.") Instead of this, may be its better to have a Preconditions.checkNotNull(dbName) at line 1010. -- To view, visit http://gerrit.cloudera.org:8080/14846 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96 Gerrit-Change-Number: 14846 Gerrit-PatchSet: 1 Gerrit-Owner: Zhou Xu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Wed, 11 Dec 2019 00:50:11 +0000 Gerrit-HasComments: Yes
