Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16204 )

Change subject: IMPALA-8125: Add query option to limit number of hdfs writer 
instances
......................................................................


Patch Set 1:

(6 comments)

I had some initial things that I noticed while doing a pass over it.

http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG@9
PS1, Line 9: This patch adds a new query option MAX_HDFS_WRITERS that limits the
Maybe we should call it FS instead of HDFS? Just cause the name is a bit 
anachronistic at this point.


http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG@26
PS1, Line 26: - Added e2e tests to confirm that the scheduler is enforcing the 
limit
It seemed based on first glance that we should probably have end-to-end tests 
for more of the different plan shapes that could be generated. Maybe I am 
missing something though.


http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/scheduling/scheduler.cc@496
PS1, Line 496:     // This implementation ensures that instances on the same 
host get consecutive
Can you also comment what it's trying to achieve (i.e. Create the desired 
number of instances while balancing them across hosts).


http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/service/query-options.cc@685
PS1, Line 685:         break;
Uhh... oops. It would be good to file a separate JIRA for the missing breaks. 
Just cause it could be something people actually run into.


http://gerrit.cloudera.org:8080/#/c/16204/1/testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test:

http://gerrit.cloudera.org:8080/#/c/16204/1/testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test@6
PS1, Line 6: ---- PLAN
Maybe we should only include DISTRIBUTEDPLAN in these tests? I feel like the 
single node plans are mostly adding noise.


http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py
File tests/custom_cluster/test_mt_dop.py:

http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py@117
PS1, Line 117:   
@CustomClusterTestSuite.with_args(impalad_args="--unlock_mt_dop=true", 
cluster_size=3)
We actually set unlock_mt_dop=true for the e2e tests, so I think this could be 
an end-to-end test.



--
To view, visit http://gerrit.cloudera.org:8080/16204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
Gerrit-Change-Number: 16204
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Jul 2020 23:47:08 +0000
Gerrit-HasComments: Yes

Reply via email to