Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16204 )
Change subject: IMPALA-8125: Add query option to limit number of hdfs writer instances ...................................................................... Patch Set 1: (6 comments) I had some initial things that I noticed while doing a pass over it. http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG@9 PS1, Line 9: This patch adds a new query option MAX_HDFS_WRITERS that limits the Maybe we should call it FS instead of HDFS? Just cause the name is a bit anachronistic at this point. http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG@26 PS1, Line 26: - Added e2e tests to confirm that the scheduler is enforcing the limit It seemed based on first glance that we should probably have end-to-end tests for more of the different plan shapes that could be generated. Maybe I am missing something though. http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/scheduling/scheduler.cc@496 PS1, Line 496: // This implementation ensures that instances on the same host get consecutive Can you also comment what it's trying to achieve (i.e. Create the desired number of instances while balancing them across hosts). http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/service/query-options.cc@685 PS1, Line 685: break; Uhh... oops. It would be good to file a separate JIRA for the missing breaks. Just cause it could be something people actually run into. http://gerrit.cloudera.org:8080/#/c/16204/1/testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test: http://gerrit.cloudera.org:8080/#/c/16204/1/testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test@6 PS1, Line 6: ---- PLAN Maybe we should only include DISTRIBUTEDPLAN in these tests? I feel like the single node plans are mostly adding noise. http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py File tests/custom_cluster/test_mt_dop.py: http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py@117 PS1, Line 117: @CustomClusterTestSuite.with_args(impalad_args="--unlock_mt_dop=true", cluster_size=3) We actually set unlock_mt_dop=true for the e2e tests, so I think this could be an end-to-end test. -- To view, visit http://gerrit.cloudera.org:8080/16204 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5 Gerrit-Change-Number: 16204 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 15 Jul 2020 23:47:08 +0000 Gerrit-HasComments: Yes