Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/13307 )
Change subject: IMPALA-8536: Add Scalable Pool Configuration to Admission Controller. ...................................................................... Patch Set 4: (22 comments) Thanks Tim and Bikram for the thorough reviews - see new patch set 4 for fixes. http://gerrit.cloudera.org:8080/#/c/13307/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13307/1//COMMIT_MSG@21 PS1, Line 21: + Max Queued Queries Multiple - this floating point number is multiplied > I think this is ok for now. I thought about it some more and having one opt Done http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@14 PS3, Line 14: The new configuration parameters are: > mention that a value of zero means it wont be used and will default to the Thanks http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@15 PS3, Line 15: Max Running > nit: use "Max Requests" to be consistent OK, this is complicated, like all naming things. Yes it is "max_requests" in thrift. But in CM (which is what I am really taking about in the commit msg) it is "Max Running Queries". And in the webui it is "Max concurrent queries" So there is no totally consistent value. I think "queries" is clearer than "requests" but we can talk more! http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@26 PS3, Line 26: this number of bytes is multiplied by the : current total number of executors at runtime to give the maximum : memory available acro > can you elaborate on what this number is like you did for the ones above? A Done http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc File be/src/scheduling/admission-controller-test.cc: http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc@503 PS3, Line 503: stats.local_stats_.num_queued = 10; > ScopedFlagSetter? Done http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc@545 PS3, Line 545: stats.agg_num_queued_ = host_count; > Would it make sense to test that the pool stats were actually modified in t Done http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc@555 PS3, Line 555: } > Would it make sense to test that the pool stats were actually modified in t Done http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@582 PS3, Line 582: bool CanAdmitRequest(const QuerySchedule& schedule, const TPoolConfig& pool_cfg, > I'd prefer to use int64_t for consistency (reasonable people can differ abo Thanks http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@603 PS3, Line 603: _size_snapshot, > nit: maybe use something like latest_cluster_size. the snapshot in the name The ClusterMembershipMgr that will come in IMPALA-8460 has a method GetSnapshot() to fetch the cluster membership. The cluster size used by AC will eventually come from there which is why I chose it. What was that made you expect a struct? If this is a common Impala term then I should change it. http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@661 PS3, Line 661: /// If it can be determined that no queries can currently be run, then zero > It would help to clarify if max_to_dequeue can be 0 if Status is non-OK. If Yes, that is much clearer thanks. http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1000 PS3, Line 1000: int64_t max_to_dequeue = : GetMaxToDequeue(queue, stats, pool_config, cluster_size_snapshot); : VLOG_RPC << "Dequeue thread will try to admit " << max_to_dequeue << " requests" : << ", pool=" << pool_name > nit: if this functionality is completely moved over to a method, maybe we c Done http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1007 PS3, Line 1007: > I agree, would prefer returning max_to_dequeue itself and handling the <= 0 Done http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1007 PS3, Line 1007: > I think the old behaviour had the advantage of logging something if nothing Yes, thanks, I moved this logic below the logging. http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1070 PS3, Line 1070: return min(stats->local_stats().num_queued, > Shouldn't this use the ceil() logic from GetMaxRequestsForPool()? Or at lea Yes, should be a double, this was a bug, thanks http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1071 PS3, Line 1071: max<int64_t>( > this can still be less than equal to zero, right? like in case of overadmis Nice, thanks http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1121 PS3, Line 1121: > Seems like now this case can happen if the cluster size changes (an externa Yes, that could be confusing. I will add a msg like: "The min_query_mem_limit 629145600 is greater than the current max_mem_resources 419430400 (calculated as 10 backends each with 40.00 MB); queries will not be admitted until more executors are available." http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1396 PS3, Line 1396: > this might be a problem if some backends are shutting down and the query wo I originally used the cluster size from QuerySchedule but hit problems with coordinator-only queries where the QuerySchedule sets the number of hosts to 1. Those queries did not scale :-( When ClusterMembershipMgr is done (IMPALA-8460) then AC should use that. It does seem like backends shutting down could cause over-admission, maybe ClusterMembershipMgr could eventually deliver more complex views of the membership. This does seem an important thing to think about. http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1403 PS3, Line 1403: > just thinking out loud: should we check if this multiple is greater than th I don;t understand what you mean here. Do you mean QuerySchedule.per_backend_mem_to_admit? Which is different for each query? http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift@685 PS3, Line 685: // 0 indicates no limit. > nit: mention what a value of 0 means for all Done http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json@83 PS3, Line 83: ent tota > executors? I changed this, in line with moving towards an executor group based model. At the moment the actual value used is the cluster size. http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json@93 PS3, Line 93: ent tota > executors? Done http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json@103 PS3, Line 103: tal numb > executors? Done -- To view, visit http://gerrit.cloudera.org:8080/13307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If47508728124076f3b9200c27cffc989f7a4f188 Gerrit-Change-Number: 13307 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 31 May 2019 18:01:48 +0000 Gerrit-HasComments: Yes