Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13307 )

Change subject: IMPALA-8536: Add Scalable Pool Configuration to Admission 
Controller.
......................................................................


Patch Set 4:

(22 comments)

Thanks Tim and Bikram for the thorough reviews - see new patch set 4 for fixes.

http://gerrit.cloudera.org:8080/#/c/13307/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13307/1//COMMIT_MSG@21
PS1, Line 21: + Max Queued Queries Multiple - this floating point number is 
multiplied
> I think this is ok for now. I thought about it some more and having one opt
Done


http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@14
PS3, Line 14: The new configuration parameters are:
> mention that a value of zero means it wont be used and will default to the
Thanks


http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@15
PS3, Line 15: Max Running
> nit: use "Max Requests" to be consistent
OK, this is complicated, like all naming things.
Yes it is "max_requests" in thrift.
But in CM (which is what I am really taking about in the commit msg) it is "Max 
Running Queries".
And in the webui it is "Max concurrent queries"
So there is no totally consistent value.
I think "queries" is clearer than "requests" but we can talk more!


http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@26
PS3, Line 26: this number of bytes is multiplied by the
            :   current total number of executors at runtime to give the maximum
            :   memory available acro
> can you elaborate on what this number is like you did for the ones above? A
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc
File be/src/scheduling/admission-controller-test.cc:

http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc@503
PS3, Line 503:   stats.local_stats_.num_queued = 10;
> ScopedFlagSetter?
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc@545
PS3, Line 545:   stats.agg_num_queued_ = host_count;
> Would it make sense to test that the pool stats were actually modified in t
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller-test.cc@555
PS3, Line 555: }
> Would it make sense to test that the pool stats were actually modified in t
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@582
PS3, Line 582:   bool CanAdmitRequest(const QuerySchedule& schedule, const 
TPoolConfig& pool_cfg,
> I'd prefer to use int64_t for consistency (reasonable people can differ abo
Thanks


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@603
PS3, Line 603: _size_snapshot,
> nit: maybe use something like latest_cluster_size. the snapshot in the name
The ClusterMembershipMgr that will come in IMPALA-8460 has a method 
GetSnapshot() to fetch the cluster membership. The cluster size used by AC will 
eventually come from there which is why I chose it. What was that made you 
expect a struct? If this is a common Impala term then I should change it.


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@661
PS3, Line 661:   /// If it can be determined that no queries can currently be 
run, then zero
> It would help to clarify if max_to_dequeue can be 0 if Status is non-OK. If
Yes, that is much clearer thanks.


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1000
PS3, Line 1000:       int64_t max_to_dequeue =
              :           GetMaxToDequeue(queue, stats, pool_config, 
cluster_size_snapshot);
              :       VLOG_RPC << "Dequeue thread will try to admit " << 
max_to_dequeue << " requests"
              :                << ", pool=" << pool_name
> nit: if this functionality is completely moved over to a method, maybe we c
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1007
PS3, Line 1007:
> I agree, would prefer returning max_to_dequeue itself and handling the <= 0
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1007
PS3, Line 1007:
> I think the old behaviour had the advantage of logging something if nothing
Yes, thanks, I moved this logic below the logging.


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1070
PS3, Line 1070:     return min(stats->local_stats().num_queued,
> Shouldn't this use the ceil() logic from GetMaxRequestsForPool()? Or at lea
Yes, should be a double, this was a bug, thanks


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1071
PS3, Line 1071:   max<int64_t>(
> this can still be less than equal to zero, right? like in case of overadmis
Nice, thanks


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1121
PS3, Line 1121:
> Seems like now this case can happen if the cluster size changes (an externa
Yes, that could be confusing.
I will add a msg like: "The min_query_mem_limit 629145600 is greater than the 
current max_mem_resources 419430400 (calculated as 10 backends each with 40.00 
MB); queries will not be admitted until more executors are available."


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1396
PS3, Line 1396:
> this might be a problem if some backends are shutting down and the query wo
I originally used the cluster size from QuerySchedule but hit problems with 
coordinator-only queries where the QuerySchedule sets the number of hosts to 1. 
Those queries did not scale :-(
When ClusterMembershipMgr is done (IMPALA-8460) then AC should use that.
It does seem like backends shutting down could cause over-admission, maybe 
ClusterMembershipMgr could eventually deliver more complex views of the 
membership.
This does seem an important thing to think about.


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1403
PS3, Line 1403:
> just thinking out loud: should we check if this multiple is greater than th
I don;t understand what you mean here. Do you mean 
QuerySchedule.per_backend_mem_to_admit? Which is different for each query?


http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift@685
PS3, Line 685:   // 0 indicates no limit.
> nit: mention what a value of 0 means for all
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json@83
PS3, Line 83: ent tota
> executors?
I changed this, in line with moving towards an executor group based model. At 
the moment the actual value used is the cluster size.


http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json@93
PS3, Line 93: ent tota
> executors?
Done


http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/metrics.json@103
PS3, Line 103: tal numb
> executors?
Done



--
To view, visit http://gerrit.cloudera.org:8080/13307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If47508728124076f3b9200c27cffc989f7a4f188
Gerrit-Change-Number: 13307
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 31 May 2019 18:01:48 +0000
Gerrit-HasComments: Yes

Reply via email to