Tim Armstrong has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc ......................................................................
Patch Set 9: (23 comments) Did a review pass while I don't have access to my desktop. http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS9, Line 293: UpdateExecStats The similarity between UpdateExecStats() and UpdateExecStatus() is potentially confusing. Spell out Statistics() maybe? Line 455: // TODO: we don't track cpu time per node now. Do that. More generally we should have an aggregate profile for each backend that includes things like this (query memory consumption, I/O, other aggregates). We don't need to do this now but it will be helpful http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: Line 132: const FInstanceExecParams& exec_params_; Might be helpful to document what owns the object being referenced. Line 164: std::vector<const FInstanceExecParams*> instance_params_list_; Might be helpful to document what owns the objects being referenced. Line 175: /// Protects fields below. Can be held while doing an RPC, so SpinLock is a bad idea. Our current SpinLock implementation should be fine (it blocks after spinning for a short period). Mutex should also be fine since this won't be heavily contented - I don't feel strongly. Line 219: /// TODO: including the median doesn't compile, looks like some includes are missing This TODO has been in the code since 2012 - maybe just remove it? Line 253: /// Root profile for all fragment instances for this fragment Stored in obj_pool? http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 364: f->targets()->push_back(target); Maybe: ->emplace_back(t_target, fragment_params.fragment.idx); Line 402: int num_instances = schedule_.GetNumFragmentInstances(); nit: intermediate variable seems unnecessary. Line 410: // TODO: rename this metric There are a handful of minor TODOs like this. I assume there's some reason not to do them in this patch, but it's not clear when/if it makes sense to address them. Do we have a JIRA or anything to track them? Line 938: // Notify that we completed with an error. This comment seems wrong - the cancellation path is used to clean up even if there isn't an error. Maybe just remove the comment? Line 956: // TODO: return here if returned_all_results_? I believe we need to do cancellation even if we returned all results. Having returned all results from the coordinator fragment doesn't imply that all fragments have returned all their results. The cancellation path is convoluted so I may be misunderstanding something though. Henry had a related patch in review: https://gerrit.cloudera.org/#/c/5987 Line 983: // TODO: clarify control flow here, it's unclear we should even process this status Henry had a patch that clarified this and improved the comment: https://gerrit.cloudera.org/#/c/5987/1/be/src/runtime/coordinator.cc http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: Line 108: /// TODO: clean up locking behavior; in particular, clarify dependency on lock_ Does any of the locking information in the impala-server.h need to be updated? Line 261: boost::mutex wait_lock_; A lot of these locks could safely be SpinLocks - unless they're used with a condition variable. Now that we have a blocking SpinLock it's better in most cases until you need a CV or strong fairness guarantees. http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 507: Extra line http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: Line 162: /// True if and only if Cancel() has been called. I had a hard time reasoning about concurrent accesses to this until I realised that it didn't actually influence query execution. Document that it's only used for DCHECKs? Or just remove it? http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: Line 111: //VLOG_QUERY << "AddChildTracker(): parent=" << this << " child=" << tracker << " " << tracker->label_ << "\n" << GetStackTrace(); I assume this will be removed. http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: Line 77: /// Execute instances and decrement refcount. I find it clearer to understand as: "Acquires ownership of 'qs'". http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 43: #define RETRY_SLEEP_MS 100 Make a class-level constant? Line 311: // don't return until every instance is prepared and record the first non-OK status Comment isn't really accurate - it doesn't record the first non-OK status because of how it handles CANCELLED. http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: Line 257: void set_is_cancelled(bool v) { is_cancelled_ = v; } Not your change but related. This bool is accessed from multiple threads yet we don't do any synchronisation. Make it an Atomic32 so that we actually have some guarantees here? http://gerrit.cloudera.org:8080/#/c/6535/9/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 96: VLOG_QUERY << "FeSupport::NativeEvalExprs: " << texprs.size() << " exprs"; Could get very noisy depending on how many constant exprs there are in the query and how many partitions predicates on partition keys get evaluated against. -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
