Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc ......................................................................
Patch Set 4: (50 comments) http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: PS4, Line 66: > nit: wrong indentation how so? http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS4, Line 58: //done_(false) { > delete Done Line 64: VLOG_QUERY << "BS::Init(): query_id=" << PrintId(query_id_) << " state_idx=" << state_idx_ << " host=" << host_; > long line i'll remove all of the extra debug logging at the end. PS4, Line 104: rpc_params->fragment_ctxs.back().fragment.idx : != params.fragment_exec_params.fragment.idx) { > The assumption is that all fragment instances belonging to the same fragmen Done PS4, Line 112: params.fragment_exec_params.fragment.idx > May as well factor this out as a local variable as it's used above too. The Done PS4, Line 170: lock_guard<mutex> l(lock_); > Is it necessary to hold the lock while creating BackendConnection below ? the lock is low-contention, and it protects status_. PS4, Line 173: client_connect_status; > Is this not used or did you intend to use this instead of status_ ? Done PS4, Line 184: const string ERR_TEMPLATE = : "ExecQueryFInstances rpc query_id=$0 failed: $1"; > Shouldn't this live in common/thrift/generate_error_codes.py ? out of scope for this patch. PS4, Line 332: lock_guard<mutex> l2(lock_); > How heavily contended is this lock ? Will this become a bottleneck if we ha no, it'll (eventually) be the backend doing the reporting, not the individual instances. PS4, Line 346: > nit: indentation is off. Done Line 365: if (status_.ok()) { > Is this block empty now ? It only contains comment, right ? i left that in there as a reminder to myself to think this through again. PS4, Line 391: num_remaining_instances_ == 0 || !status_.ok(); > Why not call IsDone() here ? IsDone() simply returns a bool, not sure what you mean. PS4, Line 393: //if (*done) stopwatch_.Stop(); > delete Done PS4, Line 434: if (!status.ok()) return false; > Does this warrant a log message ? It's not entirely clear what's the status it wasn't there before, and it has the potential to spam the log. PS4, Line 513: NULL > nullptr Done PS4, Line 549: //DCHECK_EQ(fragment_profiles_[instance_state.fragment_idx()].num_instances, : //node_exec_summary.exec_stats.size()); > ? Done http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: PS4, Line 45: . > Can you please also add a comment stating that multiple fragments run a sam the hierarchy fragment -> fragment instance is already well-documented elsewhere, so i'm not going to point that out again. PS4, Line 100: //const vector<InstanceStats>& instance_stats() const { return instance_stats_; } : //MonotonicStopWatch* stopwatch() { return &stopwatch_; } > delete Done PS4, Line 109: //InstanceStats* GetInstanceStats(const TUniqueId& instance_id); > delete Done Line 111: return num_remaining_instances_ == 0 || !status_.ok(); > nit: one line ? only for getters and setters. PS4, Line 113: &error_log_; > How does this work if multiple callers modify the map at the same time ? they don't, the comment above stipulates that you need to hold lock_ PS4, Line 114: //const std::vector<RuntimeProfile*>& instance_profiles() const { : //return instance_profiles_; : //} > delete Done PS4, Line 133: #if 0 > delete Done PS4, Line 143: / > delete Done PS4, Line 223: //bool done_; > Still needed ? Done http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 113: torn_down_(false) {} > Should the constructor also initialize coord_state_idx_ to -1 in case there Done PS4, Line 187: FinishBackendStartup > Would a better name be VerifyBackendStartup(); ? Not particularly, because this really blocks until startup finished. PS4, Line 234: new FragmentStats(avg_profile_name, root_profile_name, num_instances, obj_pool()) > Should this be added to obj_pool() too ? fragment_stats_ only contains poin i think that was the intention. Line 340: // Set the 'pending_count_' to zero to indicate that for a filter with local-only > long line Done Line 397: //CountingBarrier* exec_complete_barrier = exec_complete_barrier_.get(); > Delete ? Done PS4, Line 400: [backend_state, this, &debug_options]() { : backend_state->Exec(query_ctx_, debug_options, filter_routing_table_, : exec_complete_barrier_.get()); : }) > IMHO, it seems easier to read if this is defined as a function like the exi what would the function contain? PS4, Line 421: //VLOG_QUERY << "backend status: idx=" << backend_state->st > ? Done http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS4, Line 246: Filled in : /// StartBackendExec(). > Filled in by InitBackendStates() ? Done PS4, Line 270: = nullptr; > Do we have a guideline on whether pointer like this should be initialized i i like it with the declaration, then you don't need to repeat it for every c'tor, if you have several. also, you won't forget to initialize null pointers. http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: Line 139: new RuntimeState(query_state_, fragment_ctx_, instance_ctx_, ExecEnv::GetInstance())); > long line Done PS4, Line 249: report_thread_.reset( : new Thread("plan-fragment-executor", "report-profile", : &FragmentInstanceState::ReportProfileThread, this)); > Is one reporting thread per fragment instance a bit much ? Have you conside out of scope. PS4, Line 267: //RETURN_IF_ERROR(quer > delete Done PS4, Line 298: RETURN_IF_ERROR(status); > Why not RETURN_IF_ERROR(exec_tree_->GetNext(...))) at line 294 instead ? Done http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: PS4, Line 52: for > long line Done PS4, Line 83: void Cancel(); > Please also document whether this is idempotent. Done PS4, Line 89: If the preparation phase encountered an error, GetOpenStatus() will : /// return that error without blocking. > I don't see the opened_promise_ being updated when Prepare() fails in Exec( good catch. http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS4, Line 56: // start new thread for query startup > This is done so the RPC won't be blocked waiting for all FInstances to star Done http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS4, Line 194: stringstream s; > Did you mean to use s somewhere ? It looks like dead code as it stands now. it's used in the line below. PS4, Line 246: 3 > How is this determined ? Should we use #define or constexpr to represent it we have this same retry loop elsewhere as well, we should probably fix this globally, but that's out of scope. PS4, Line 250: rpc_status = Status(res.status); > Is this needed given line 256 below which assigns res.status to result_stat Done PS4, Line 317: Status status > Not an error but there is a variable named status already in the outer scop Done http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/service/impala-internal-service.cc File be/src/service/impala-internal-service.cc: Line 56: void ImpalaInternalService::CancelQueryFInstances(TCancelQueryFInstancesResult& return_val, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/6535/4/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS4, Line 399: accross > across there are two indices, which come in handy in different situation: - fragment index, which is part of fragment_instance_id - per_fragment_instance_idx Line 411: // Id of this instance in its role as a sender. > Used by DataStreamSender ? or whoever needs to deal with data streams. PS4, Line 433: coord > Will backend_state_idx or coord_backend_state_idx be better ? it's the index of the coordinator's private per-backend state. i find the name clear enough. -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-HasComments: Yes
