Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/10060 )
Change subject: [Preview]IMPALA-5216: Make admission control queuing async ...................................................................... Patch Set 2: (25 comments) addressed review comment, working on TODOs http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.h@329 PS2, Line 329: bool execution_started_ = false; > leftover? Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/runtime/coordinator.cc@160 PS2, Line 160: // Ensure 'this' Coordinator object can be safely destroyed. : DCHECK(query_status_.ok()); : query_status_ = prepare_status; : CancelInternal(); > this is somewhat subtle logic, so it'd be good to combine it with the tail Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@45 PS2, Line 45: AdmissionStatus > Maybe it should be AdmissionOutcome? This is really the final outcome of ad Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@199 PS2, Line 199: /// returns an OK status, schedule->is_admitted() is true and admit_status is ADMITTED. > Is (admit_status.IsSet() && admit_status.Get() == ADMITTED) equivalent to s Done. It seems like the check for isAdmitted() in ReleaseQuery() was redundant as it is only ever called if the query was successfully admitted. http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.h@395 PS2, Line 395: Promise<AdmissionStatus>* admit_status; > What owns the memory of admit_status? Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@505 PS2, Line 505: lock_guard<mutex> lock(admission_ctrl_lock_); > Do we need to worry about races between admit_status being set to CANCELLED In case admit_status is set to cancelled before this, the following will happen: for REJECTED the admission controller thread would simple exit after checking the status returned by AdmitQuery(). for ADMITTED immediately, the cancellation check after returning from AdmitQuery will take care of it. I agree that the admit_status would not be consistent with the admission decision logged, hence I will change this to return immediately if admit_status->TrySet fails and let the cancellation check in CRS do the rest. http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@576 PS2, Line 576: admit_status->TrySet(AdmissionStatus::REJECTED_OR_TIMED_OUT); > It would be nice to log the outcome of admission in all cases. Maybe we sho that makes sense, but in this case, for every admission decision, there is more context that needs to be printed to the log which is done by the callee. For eg, printing the reason of rejection, if admitted immediately, etc. If you feel that separately printing the admission outcome when tryset is invoked will help minimize bugs, then we can add it for sure. http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@595 PS2, Line 595: if (queue->Remove(&queue_node)) { > One line? Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@885 PS2, Line 885: // TODO: Maybe dont even check cancelled here, just try admitting it and let the > +1 to removing a code path if it isn't totally necessary. As long as we cat I thought about this more and it seems like we should keep it. for the case where a query completes and calls AdmissionController::ReleaseQuery(), this will notify the dequeue thread, if cancel is called while dequeue thread is processing the query and CanAdmitRequest is false, then if we dont check for cancellation there, the dequeue thread will block until it is awoken again. This will cause some delay in admitting the query waiting in line after the cancelled query. The right behavior should be that if it is cancelled, it should dequeue the query so that it can process the next one. http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/scheduling/admission-controller.cc@901 PS2, Line 901: DCHECK > We should really log the actual value if the DCHECK fails. I wish we have a Done http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/scheduling/query-schedule.h@274 PS1, Line 274: > Can the coordinator have a reference to Promise<AdmitStatus> then? Having removed since it seems like the check for isAdmitted() in ReleaseQuery() was redundant as it is only ever called if the query was successfully admitted. http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@279 PS2, Line 279: /// admit or reject or mark timed out) or on cancellation of query (to mark cancelled). > Maybe slightly expand on how cancellation works. I think a few important th Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@282 PS2, Line 282: Promise<AdmissionStatus> admit_status_; > After reading through the rest of the patch I feel like it might be benefic As discussed, modified the class comment for Promise class. http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.h@318 PS2, Line 318: Exec > Exec() Done http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/client-request-state.h@316 PS1, Line 316: /// after Exec() has been successfully called on it (that is, if 'coord_exec_started_' > I think that would work. It would also be ok if it was a wrapper around std Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@906 PS2, Line 906: if(uses_admission_control()){ > Nit: missing whitespace. Fits on one line? Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@909 PS2, Line 909: coordinator = coord(); > Does this line need to be in the critical section? It seems like we don't n Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1121 PS2, Line 1121: nullptr > nullptr arg not needed Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1146 PS2, Line 1146: if (exec_env_->admission_controller() != nullptr) { > Let's remove the exec_env_ member and use ExecEnv::GetInstance() while we'r Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1147 PS2, Line 1147: status > It's a bit hard to follow the control flow around 'status'. I wonder if it Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1151 PS2, Line 1151: if (int sleep_ms = GetSleepTimeFromDebugOptions( > We generally avoid assigning variables inside if statements. This isn't too Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1159 PS2, Line 1159: // TODO: this is a very small window in practice, not sure if its worth the > So it's very unlikely that this would save us work? It looks like it could My initial thought was that since cancellation is initiated by a user, and the time window between successful admission and coord being started is very small from a human's perspective. But I guess if the code around this is easy to understand it then it might be worth keeping it. http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1178 PS2, Line 1178: if (int sleep_ms = GetSleepTimeFromDebugOptions( > Same comment about if statement. Done http://gerrit.cloudera.org:8080/#/c/10060/2/be/src/service/client-request-state.cc@1198 PS2, Line 1198: if(cancelled) { > whitespace Done http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/10060/1/be/src/service/impala-hs2-server.cc@465 PS1, Line 465: // TODO: Does it make sense to change "in_flight_queries" to "initialized_queries". > I'm ok with deferring the work to a later cleanup patch too so long as we d added explanation and a TODO where in_flight_queries is declared -- To view, visit http://gerrit.cloudera.org:8080/10060 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I989cf5b259afb8f5bc5c35590c94961c81ce88bf Gerrit-Change-Number: 10060 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 24 Apr 2018 22:34:38 +0000 Gerrit-HasComments: Yes
