Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15961 )
Change subject: IMPALA-9692 (part 3): Model QuerySchedule as a protobuf ...................................................................... Patch Set 5: (28 comments) http://gerrit.cloudera.org:8080/#/c/15961/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15961/4//COMMIT_MSG@31 PS4, Line 31: In order to keep payloads reasonable for the evenaul RPC interface, : AdmissionController::ReleaseQuery() and ReleaseQueryBacken > the reason we do this is because ReleaseQuery() and ReleaseQueryBackend() w Right. http://gerrit.cloudera.org:8080/#/c/15961/4//COMMIT_MSG@43 PS4, Line 43: Para > nit: typo? Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/coordinator-backend-state.cc@a95 PS4, Line 95: : > intentionally removed? Yeah, we no longer include the host with the FInstanceExecParams that we send back (because we don't want a duplicate copy of the hostname per finstance), so there's nothing to DCHECK for here anymore. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.h File be/src/runtime/exec-params.h: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.h@51 PS4, Line 51: Return a reference > nit: typo Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.h@73 PS4, Line 73: std::unordered_map<int32_t, const TPlanFragment&> fragments_; > Nit: In the coordinator, we treat this like a vector, relying on the fragme I guess the reason I did it this way is that the fragments don't come to us in sorted order by plan id, so we would have to insert them into the vector and then sort it, which is O(n*log(n)) in the number of fragments whereas just inserting into the hashmap is O(n) I don't have a strong preference, though. There are of course arguments for and against both approaches. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.cc File be/src/runtime/exec-params.cc: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.cc@32 PS4, Line 32: 'exec_reques > nit: typo Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.cc@32 PS4, Line 32: reference > nit: typo Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/runtime/exec-params.cc@48 PS4, Line 48: 1); > is it worth adding some DCHECKs to make sure both of these vectors have at Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/admission-controller.h@341 PS4, Line 341: std::unique_ptr<QuerySchedulePB>* schedule_result > might be nice to put this in a struct called AdmissionResponse? even if it Definitely agree that the eventual AdmitQuery RPC will take an AdmitQueryReq and return an AdmitQueryResp. I was going to leave defining those for the eventual patch where I define the whole AdmissionControlService RPC interface since a few other things will change that will impact it (eg. SubmitForAdmission() is blocking but the eventual AdmitQuery RPC will be non-blocking, so the possible 'admit_outcome' values are different) that I haven't fully fleshed out yet. I can define a AdmitQueryResp that just contains a QuerySchedulePB for now or something like that if you want. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/admission-controller.cc@899 PS4, Line 899: move( > nit: we omit the std namespace in .cc files, right? I fixed the instances in this file. Fwiw, there are some instances of "std::unordered_map" in scheduler.cc that are necessary to disambiguate with boost::unordered_map, which is used in scheduler.h. Ideally, we would remove all the uses of boost::unordered_map from Impala, but I figured this patch was big enough as it is so I left them for now. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/admission-controller.cc@1607 PS4, Line 1607: GetMemToAdmit(*schedule, entry.second); : } : } > Nit: You could use GetMemToAdmit() here Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/cluster-membership-mgr.cc File be/src/scheduling/cluster-membership-mgr.cc: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/cluster-membership-mgr.cc@344 PS4, Line 344: const UniqueIdPB& backend_id > why was this changed to use a backend_id instead of a be_desc? just curious Previously, Coordinator::BackendState had access to its BackendDescriptorPB via its BackendExecParams. That's no longer the case since we don't include the BackendDescriptorPB with the BackendExecParamsPB (since the coordinator already has the BackendDescriptorPBs for all of the backends from the statestore and it would be a bunch of unnecessary data getting transferred to send them for each backend in each QuerySchedulePB). Instead, we just send the BackendIds for each backend, and then anytime the coordinator needs the BackendDescriptorPBs, as it does here, it can look them up in the current membership. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/cluster-membership-mgr.cc@352 PS4, Line 352: INFO > should this ever happen? is there a reason it can't be DFATAL? Yeah, I think this could happen if: - a query gets scheduled on a backend just before the statestore decides that its no longer responding and removes it from the membership - the coordinator then tries to start up execution just after receiving the updated membership but before CancelFromThreadPool gets around to cancelling the query - Exec() to the backend fails so we try to blacklist it, but its no longer in the cluster membership Probably pretty unlikely, but possible. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@56 PS4, Line 56: protobuf struct c > do you mean backend_exec_params? No, the point I'm making is that this is a pointer to something that's a child of QuerySchedule::query_schedule_pb_. I reworded the comment to hopefully make it clearer. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@58 PS4, Line 58: er > 'pb' is a bit of an add name for this, why not just 'backend_exec_params_pb That's fair. Of course I was trying to avoid getting too verbose/repetitive - it would be unfortunate to have a bunch of instances of 'backend_exec_params.backend_exec_params_pb->..' all over Scheduler. Combined with your suggestion to rename QuerySchedule/QuerySchedulePB, here's what I'm thinking: 'QuerySchedule' is arguably made up of two parts: the actual schedule (QuerySchedulePB) and some intermediate state about the query for the convenience of Scheduler/AdmissionController, so maybe the best thing would be to rename 'QuerySchedule' to something like 'ScheduleState'? Then, maybe I rename the rest of the classes here to match, i.e. BackendScheduleState/FragmentScheduleState/etc, so they also don't clash with their corresponding BackendExecParamsPB/etc. Then, I can just name this 'exec_params', and using it will look like 'backend_schedule_state.exec_params->...' I'm gonna save actually doing this until we have agreement (or maybe even submit it in a followup to keep from making this patch even messier). http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@82 PS4, Line 82: Scheduler::Comput > Nit: BackendExecParamsPB Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@108 PS4, Line 108: iven backend will > swapped to 'BackendExecParamsPB' right? Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@114 PS4, Line 114: _' > same comment as above, name could be revised. Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@124 PS4, Line 124: l > so we shouldn't use this class (or any of the related structs) outside the Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@130 PS4, Line 130: FInstanceExec > it would be nice if this class or QuerySchedulePB was renamed. QuerySchedul Addressed in another comment. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@193 PS4, Line 193: const TPlanFragment& GetContainingFragment(PlanNodeId node_id) const { > why is non-const access to per_backend_exec_params_ and fragment_exec_param We used to use a pattern where Scheduler creates a local PerBackendExecParams, fills it in, does a couple more loops over it to set some additional parameters, and then gives it to QuerySchedule via set_per_backend_exec_params(), after which its no longer modified. With this change, we have QuerySchedule manage creating the individual BackendExecParams in GetOrCreateBackendExecParams() so that it can create the BackendExecParamsPBs, and so when we do the additional loops over it we need to be able to access it in a non-const way. Of course, now that this class is only used by Scheduler/AdmissionController, it matters a little less cause we're sort of trusting them to mutate the schedule correctly (i.e. we already expose non-const access to the FragmentExecParams below) http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@285 PS4, Line 285: > mention that ownership is transferred to the coordinator after submission h Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.h@320 PS4, Line 320: : /// Used to generate consecutive fragment instance ids. > Nit: Same thought I mentioned on ExecParams, the fragment indexes are expec Addressed on your other comment. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.cc File be/src/scheduling/query-schedule.cc: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.cc@86 PS4, Line 86: for (int i = 0; i < fragments_.size(); ++i) > can u still use the 'for ([var] : fragments)' pattern? No, we rely on 'fragment_exec_params_' being in order of fragment idx, but since fragments_ is an unordered_map the order of iteration over it isn't guaranteed (that is, unless I go with Joe's suggestion to make 'fragments_' a vector, see other comments for details) http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.cc@153 PS4, Line 153: for (const auto& entry : per_backend_exec_params_) { > is this just general cleanup? I guess per_backend_exec_params_ is smaller t Validate() is called after the FInstanceExecParamsPB have been swapped to the BackendExecParamsPB, so we have to iterate over per_backend_exec_params_ to have access to them. Added some comments about this. http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/query-schedule.cc@201 PS4, Line 201: GetOrCreateBackendEx > seems more like 'GetOrCreateBackendExecParams' or 'LoadBackendExecParams' m Done http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/15961/4/be/src/scheduling/scheduler.cc@300 PS4, Line 300: *scan_ranges.mutable_scan_ranges() = {entry.second.begin(), entry.second.end()}; : } > why do u have to do this lookup? how was it working before? The structure has changed, because thrift allows maps with value types that are lists but protobuf doesn't (see eg. the comment on ScanRangesPB definition) So previously 'first_entry->second' and 'instance_params.per_node_scan_ranges' where both of the form 'map<int, list<ScanRangeParamsPB>>' and we could just copy one straight into the other but now we're copying into a protobuf struct of the form 'map<int, ScanRangesPB>' where ScanRangesPB just contains a 'list<ScanRangeParamsPB>', so we have to iterate over and copy each element individually to do the transform. http://gerrit.cloudera.org:8080/#/c/15961/4/common/protobuf/admission_control_service.proto File common/protobuf/admission_control_service.proto: http://gerrit.cloudera.org:8080/#/c/15961/4/common/protobuf/admission_control_service.proto@39 PS4, Line 39: n i > typo? No, its saying that this is a fragment-level value not a query-level value. I'll reword this to hopefully make it clearer. -- To view, visit http://gerrit.cloudera.org:8080/15961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1db64e72f84604b1d8ac24e0bdd4ad6bedd6bcd9 Gerrit-Change-Number: 15961 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Fri, 19 Jun 2020 00:50:36 +0000 Gerrit-HasComments: Yes
