Henry Robinson has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc ......................................................................
Patch Set 6: (14 comments) I quickly looked at some familiar files. http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS6, Line 46: using boost::lock_guard; : using boost::mutex; #include "common/names.h" for these. PS6, Line 348: true the class comment says that Cancel() returns true if the cancellation happened - but it almost certainly didn't here. http://gerrit.cloudera.org:8080/#/c/6535/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS2, Line 252: reate BackendStates : bool has_coord_fragment = schedule_.GetCoordFragment() != nullptr; : const TNetworkAddress& coord_address = ExecE doesn't need to be two statements. PS2, Line 262: kend_state->Init(entry.second, fra const &? PS2, Line 264: This would be better named coord_backend_idx_. PS2, Line 1000: // debugging aid for back Acquiring this lock is problematic for an RPC handler. See <TODO>. http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS6, Line 408: // TODO: rename this metric Why not do this now? (But note that for the moment that name isn't published in the profile anywhere). PS6, Line 415: || status.IsCancelled() When would we have a backend that was cancelled at this point (since cancellation always originates from the coordinator)? And if there was a cancelled backend, why overwrite its status with the error state? From a user perspective, if they cancel the query, surfacing an error would be confusing. PS6, Line 858: Report aggregate : // query profiles at this point. This comment seems out-of-sync with the rename of ReportQuerySummary(). PS6, Line 885: TODO: do we need this error propagation path, in addition to the status report : // we'll get anyway? This error is needed so that client-facing operations can fail-fast. The next coordinator status report might be seconds away. PS6, Line 986: if (!(returned_all_results_ && status.IsCancelled()) && !status.ok()) { Will you include the fix for IMPALA-4925 here as discussed? (see https://gerrit.cloudera.org/#/c/5987/1/be/src/runtime/coordinator.cc) PS6, Line 993: lock_guard<mutex> l(lock_); We spoke offline about removing this lock acquisition in this patch (see https://gerrit.cloudera.org/#/c/5971/2 for my patch a month or so ago). This is too expensive to take in an RPC handler - when a query completes this can lead to significant convoying on this lock, and with KRPC the reactor threads can get serialized (this would still be an issue even if the reports were batched). Do you want to tackle this in this patch or in a follow-on? http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS6, Line 199: // TODO: what to do with this? should we treat a failure to get a backend : // connection as a failure of instance execution? Yes. The report RPC from a backend to the coordinator is how the backend tells if it's orphaned from the coordinator. That should cause the backend to fail, because it can't receive cancellation messages, and might otherwise block forever. We CancelInternal() later if the RPC itself fails. This is just another way that the RPC can logically fail. KRPC abstracts this away. http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.h File be/src/runtime/query-state.h: PS6, Line 65: - guarantee delivery of the ReportExecStatus rpc in case of an error, otherwise : /// the query might hang We can't guarantee delivery - not quite sure what you mean here. If the RPC can't be sent, the query state should get torn down on the assumption that the coordinator has failed. -- To view, visit http://gerrit.cloudera.org:8080/6535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
