Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend ......................................................................
Patch Set 9: (22 comments) Focused mostly where I left off previously, from the Coordinator. http://gerrit.cloudera.org:8080/#/c/4054/6/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 229: .. total # fragments -1? http://gerrit.cloudera.org:8080/#/c/4054/9/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS9, Line 227: /// Index of 'this' in Coordinator::fragment_instance_states_ : int state_idx_; not used PS9, Line 485: int32_t inconsistent use of int/int32_t PS9, Line 486: nit: space PS9, Line 488: num_remaining_fragment_instances_ = num_remote_instances; move this above near other usage of num_remote_instances PS9, Line 494: // to keep things simple, make async Cancel() calls wait until plan fragment : // execution has been initiated, otherwise we might try to cancel fragment : // execution at Impala daemons where it hasn't even started I don't understand why this comment belongs here. Am I missing something or should it be moved/removed? PS9, Line 542: FinishQueryStartup Seems weird that we wouldn't have to call a function called 'FinishQueryStartup()' when there are no remote finstances. I'd suggest changing the name but I guess it's fine since I assume this will go away with Henry's upcoming change. PS9, Line 637: // apparently this was never called for the coordinator fragment : //exec_state->ComputeTotalSplitSize( : //rpc_params.fragment_instance_ctx.per_node_scan_ranges); Why leave this as a comment? I'd imagine it will be called for the coord fragment after Henry's change to make the coord fragment start like any other. PS9, Line 645: GetInstanceIdx(coord_state->fragment_instance_id()) this and other cases where we index into fragment_instance_states_ would be nice to use FragmentInstanceState::state_idx_ It looks to me like state_idx_ is not set or used yet. If we're not going to use it, let's remove it. PS9, Line 735: // DCHECK(filter_mode_ == TRuntimeFilterMode::OFF); why is this a comment? PS9, Line 746: instance_state_idx same thing about using state_idx_ PS9, Line 1555: const DebugOptions* debug_options It'd be nice to avoid passing these through the fn. I don't see any reason they can't just be set on FragmentInstanceState*. PS9, Line 1564: int instance_state_idx = GetInstanceIdx(exec_params.instance_id); : FragmentInstanceState* exec_state = fragment_instance_states_[instance_state_idx]; same thing about the state_idx_ parameter PS9, Line 1934: &fragment_profiles_[instance_state->fragment_id()]; was removing the bounds checks intentional? Here and below. http://gerrit.cloudera.org:8080/#/c/4054/6/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: Line 101: nit: extra line http://gerrit.cloudera.org:8080/#/c/4054/9/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS9, Line 547: Also sets up and : /// registers the state for the single coordinator fragment instance. Looks to me like it sets up the state for all non-coordinator fragment instances. http://gerrit.cloudera.org:8080/#/c/4054/9/be/src/scheduling/query-resource-mgr.cc File be/src/scheduling/query-resource-mgr.cc: PS9, Line 53: void ResourceResolver::GetResourceHostport(const TNetworkAddress& src, : TNetworkAddress* dest) const { : if (ExecEnv::GetInstance()->is_pseudo_distributed_llama()) { : auto entry = impalad_to_dn_.find(src); : if (entry != impalad_to_dn_.end()) { : *dest = entry->second; : } else { : *dest = TNetworkAddress(); : } : } else { : dest->hostname = src.hostname; : dest->port = 0; : } I take it you'll rebase on the llama removal patch? http://gerrit.cloudera.org:8080/#/c/4054/9/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: PS9, Line 46: boost::unordered_map Do you know if we're avoiding std::unordered_map? PS9, Line 176: extra spaces http://gerrit.cloudera.org:8080/#/c/4054/4/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: PS4, Line 541: // total_assigned_bytes won't hit total_size until everything is assigned : while (params_idx < params_list.size() > i'm not sure what to add. Ignore this, I misread this code before. http://gerrit.cloudera.org:8080/#/c/4054/9/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: PS9, Line 942: // TODO-MT: call AdmitQuery() any reason you can't do this now? http://gerrit.cloudera.org:8080/#/c/4054/9/be/src/service/fragment-mgr.cc File be/src/service/fragment-mgr.cc: PS9, Line 148: VLOG_QUERY This might be noisy for QUERY -- To view, visit http://gerrit.cloudera.org:8080/4054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-HasComments: Yes