Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3902: Scheduler improvements for running multiple fragment instances on a single backend ......................................................................
Patch Set 4: (27 comments) http://gerrit.cloudera.org:8080/#/c/4054/4/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS4, Line 218: const TNetworkAddress local_addr_; > Can you use ExecEnv::backend_address()? Or find some way to avoid the dupli Done PS4, Line 297: Wait > nit: Wait() Done PS4, Line 388: fragment_instance_state_idxs > Is this referencing the "fragment id (TPlanFragment.id)"? If so, please say it's not, it's an index into fragment_instance_states_. i thought that would be clear from the name. added comment. http://gerrit.cloudera.org:8080/#/c/4054/4/be/src/scheduling/query-schedule.cc File be/src/scheduling/query-schedule.cc: PS4, Line 121: std::sort(fragments.begin(), fragments.end(), : [](const TPlanFragment* a, const TPlanFragment* b) { return a->id < b->id; }); > shouldn't these come in an expected order? this adds an extra level of robustness, otherwise i have to make this ordering guarantee part of the interface. i don't expect this to have any performance impact. PS4, Line 346: NULL > nullptr Done http://gerrit.cloudera.org:8080/#/c/4054/4/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: Line 97: std::map<PlanNodeId, int> per_exch_num_senders; > any benefit from using map over unordered_map here? it makes it easier to assign to thrift structs, and this isn't performance-critical. PS4, Line 173: NULL > nullptr Done Line 174: const TPlanFragment* GetCoordFragment() const; > nit: trailing whitespace Done Line 213: MtFragmentExecParams* GetFragmentExecParams(FragmentId id) { > If you need non-const access, isn't it more idiomatic to return a non-const we typically return pointers for mutable structures (and const refs for immutable structures). PS4, Line 219: NULL > nullptr Done Line 307: > comment, e.g. what state in MtFragmentExecParams gets initialized here vs. Done http://gerrit.cloudera.org:8080/#/c/4054/4/be/src/scheduling/simple-scheduler.cc File be/src/scheduling/simple-scheduler.cc: PS4, Line 384: auto > auto only for iterators this is a map iterator PS4, Line 414: depth-first traversal > isn't this an implementation detail of the other MtComputeFragmentExecParam Done PS4, Line 425: f > can you change f indexes here to match the code? it's explained as if it st moved comment PS4, Line 433: const TDataStreamSink& sink = src_fragment.output_sink.stream_sink; : DCHECK( : sink.output_partition.type == TPartitionType::UNPARTITIONED : || sink.output_partition.type == TPartitionType::HASH_PARTITIONED : || sink.output_partition.type == TPartitionType::RANDOM); > move this closer to where it's used on l453 Done PS4, Line 470: MtComputeFragmentExecParams > Can we be sure that recursion depth will not be a problem here? yes Line 524: // try to load-balance scan ranges by assigning just beyond the > nit: wrap at 90 characters Done PS4, Line 541: while (params_idx < params_list.size() : && total_assigned_bytes < threshold_total_bytes) { > It's not obvious to me why this will always assign all params. If the last i'm not sure what to add. for the last fragment, threshold_total_bytes == total_size, so this won't get exceeded. if total_assigned_bytes == total_size, by definition there can't be any params left. let me know how i can clarify. Line 695: for (int j = 0; j < params.hosts.size(); ++j, ++num_fragment_instances) { > you could call reserve() before this line i'll leave the old code as-is. PS4, Line 933: if (!FLAGS_disable_admission_control) { : RETURN_IF_ERROR(admission_controller_->AdmitQuery(schedule)); : } : if (!FLAGS_enable_rm) return Status::OK(); : : string user = GetEffectiveUser(schedule->request().query_ctx.session); : if (user.empty()) user = "default"; : schedule->PrepareReservationRequest(resolved_pool, user); : const TResourceBrokerReservationRequest& reservation_request = : schedule->reservation_request(); : if (!reservation_request.resources.empty()) { : Status status = resource_broker_->Reserve( : reservation_request, schedule->reservation()); : if (!status.ok()) { : // Warn about missing table and/or column stats if necessary. : const TQueryCtx& query_ctx = schedule->request().query_ctx; : if(!query_ctx.__isset.parent_query_id && : query_ctx.__isset.tables_missing_stats && : !query_ctx.tables_missing_stats.empty()) { : status.AddDetail(GetTablesMissingStatsWarning(query_ctx.tables_missing_stats)); : } : return status; : } : RETURN_IF_ERROR(schedule->ValidateReservation()); : AddToActiveResourceMaps(*schedule->reservation(), coord); : } > while the llama related code will be removed soon, I think you should still Done Line 1104: it = backend_map_.insert(make_pair(ip, BackendList())).first; > BackendConfig has been moved to an own file in asf/master, and this code ha i'm sure it will, but i'm holding off on rebases for now. http://gerrit.cloudera.org:8080/#/c/4054/4/be/src/scheduling/simple-scheduler.h File be/src/scheduling/simple-scheduler.h: PS4, Line 489: > nit: whitespace, wrap at 90 characters Done PS4, Line 492: mt_fragment_exec_params_ > Is there a way around referring to a private member of the schedule here? this class deals with populating private members of that class, so no. PS4, Line 499: MtComputeFragmentExecParams > The separation between what structures are set/updated in QuerySchedule and QuerySchedule is really just a helper class which functions as a container for scheduling data. some of that data comes from the initial TExecRequest, some is created by the scheduling logic. i moved all static setup functions as well as purely mechanical functions into QuerySchedule (such as figuring out the number of remote fragment instances), but i think the actual scheduling logic should stay in the scheduler. PS4, Line 502: depth-first traversal > Is this implementation-detail relevant for the caller? it's needed for understanding the correctness - the bottom-up process relies on having scheduling data for the children before scheduling the parent fragment. PS4, Line 504: MtComputeFragmentExecParams > It'd be nice to have a different and more specific name, e.g. MtCreateFInst i would argue that the current name is more specific, because it does more than just the FInstanceExecParams. i could call this MtComputeMtFragmentExecParams, but didn't think that was necessary. also, keep in mind that the 'mt' prefixes are only temporary. http://gerrit.cloudera.org:8080/#/c/4054/4/be/src/service/query-exec-state.cc File be/src/service/query-exec-state.cc: PS4, Line 439: // TODO-MT: if exec_request_.query_options.mt_dop isn't set, set it to the pool : // default, if any > This should already happen in impala-{hs2,beeswax}-server when the query co removed -- To view, visit http://gerrit.cloudera.org:8080/4054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I240445199e22f009f4e72fdb8754eb8d77e3d680 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-HasComments: Yes