Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10060 )

Change subject: [Preview]IMPALA-5216: Make admission control queuing async
......................................................................


Patch Set 2:

(25 comments)

addressed review comment, working on TODOs

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.h@329
PS2, Line 329:   bool execution_started_ = false;
> leftover?
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.cc@160
PS2, Line 160:       // Ensure 'this' Coordinator object can be safely 
destroyed.
             :       DCHECK(query_status_.ok());
             :       query_status_ = prepare_status;
             :       CancelInternal();
> this is somewhat subtle logic, so it'd be good to combine it with the tail
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@45
PS2, Line 45: AdmissionStatus
> Maybe it should be AdmissionOutcome? This is really the final outcome of ad
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@199
PS2, Line 199:   /// returns an OK status, schedule->is_admitted() is true and 
admit_status is ADMITTED.
> Is (admit_status.IsSet() && admit_status.Get() == ADMITTED) equivalent to s
Done. It seems like the check for isAdmitted() in ReleaseQuery() was redundant 
as it is only ever called if the query was successfully admitted.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@395
PS2, Line 395:     Promise<AdmissionStatus>* admit_status;
> What owns the memory of admit_status?
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@505
PS2, Line 505:     lock_guard<mutex> lock(admission_ctrl_lock_);
> Do we need to worry about races between admit_status being set to CANCELLED
In case admit_status is set to cancelled before this, the following will happen:

for REJECTED the admission controller thread would simple exit after checking 
the status returned by AdmitQuery().

for ADMITTED immediately, the cancellation check after returning from 
AdmitQuery will take care of it.

I agree that the admit_status would not be consistent with the admission 
decision logged, hence I will change this to return immediately if 
admit_status->TrySet fails and let the cancellation check in CRS do the rest.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@576
PS2, Line 576:         
admit_status->TrySet(AdmissionStatus::REJECTED_OR_TIMED_OUT);
> It would be nice to log the outcome of admission in all cases. Maybe we sho
that makes sense, but in this case, for every admission decision, there is more 
context that needs to be printed to the log which is done by the callee.
For eg, printing the reason of rejection, if admitted immediately, etc.
If you feel that separately printing the admission outcome when tryset is 
invoked will help minimize bugs, then we can add it for sure.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@595
PS2, Line 595:       if (queue->Remove(&queue_node)) {
> One line?
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@885
PS2, Line 885:         // TODO: Maybe dont even check cancelled here, just try 
admitting it and let the
> +1 to removing a code path if it isn't totally necessary. As long as we cat
I thought about this more and it seems like we should keep it. for the case 
where a query completes and calls AdmissionController::ReleaseQuery(), this 
will notify the dequeue thread, if cancel is called while dequeue thread is 
processing the query and CanAdmitRequest is false, then if we dont check for 
cancellation there, the dequeue thread will block until it is awoken again. 
This will cause some delay in admitting the query waiting in line after the 
cancelled query.

The right behavior should be that if it is cancelled, it should dequeue the 
query so that it can process the next one.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@901
PS2, Line 901: DCHECK
> We should really log the actual value if the DCHECK fails. I wish we have a
Done


http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/query-schedule.h@274
PS1, Line 274:
> Can the coordinator have a reference to Promise<AdmitStatus> then?  Having
removed since it seems like the check for isAdmitted() in ReleaseQuery() was 
redundant as it is only ever called if the query was successfully admitted.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@279
PS2, Line 279:   /// admit or reject or mark timed out) or on cancellation of 
query (to mark cancelled).
> Maybe slightly expand on how cancellation works. I think a few important th
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@282
PS2, Line 282:   Promise<AdmissionStatus> admit_status_;
> After reading through the rest of the patch I feel like it might be benefic
As discussed, modified the class comment for Promise class.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@318
PS2, Line 318: Exec
> Exec()
Done


http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/client-request-state.h@316
PS1, Line 316:   /// after Exec() has been successfully called on it (that is, 
if 'coord_exec_started_'
> I think that would work. It would also be ok if it was a wrapper around std
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@906
PS2, Line 906:    if(uses_admission_control()){
> Nit: missing whitespace. Fits on one line?
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@909
PS2, Line 909:    coordinator = coord();
> Does this line need to be in the critical section? It seems like we don't n
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1121
PS2, Line 1121: nullptr
> nullptr arg not needed
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1146
PS2, Line 1146:   if (exec_env_->admission_controller() != nullptr) {
> Let's remove the exec_env_ member and use ExecEnv::GetInstance() while we'r
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1147
PS2, Line 1147: status
> It's a bit hard to follow the control flow around 'status'. I wonder if it
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1151
PS2, Line 1151:     if (int sleep_ms = GetSleepTimeFromDebugOptions(
> We generally avoid assigning variables inside if statements. This isn't too
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1159
PS2, Line 1159:       // TODO: this is a very small window in practice, not 
sure if its worth the
> So it's very unlikely that this would save us work? It looks like it could
My initial thought was that since cancellation is initiated by a user, and the 
time window between successful admission and coord being started is very small 
from a human's perspective. But I guess if the code around this is easy to 
understand it then it might be worth keeping it.


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1178
PS2, Line 1178:   if (int sleep_ms = GetSleepTimeFromDebugOptions(
> Same comment about if statement.
Done


http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1198
PS2, Line 1198:   if(cancelled) {
> whitespace
Done


http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/impala-hs2-server.cc@465
PS1, Line 465:   // TODO: Does it make sense to change "in_flight_queries" to 
"initialized_queries".
> I'm ok with deferring the work to a later cleanup patch too so long as we d
added explanation and a TODO where in_flight_queries is declared



--
To view, visit http://gerrit.cloudera.org:8080/10060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf
Gerrit-Change-Number: 10060
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 24 Apr 2018 22:34:38 +0000
Gerrit-HasComments: Yes

Reply via email to