Oops, ignore the outdated PS2 comments. The PS6 ones are current though! On 17 April 2017 at 16:41, Henry Robinson (Code Review) <[email protected] > wrote:
> Henry Robinson has posted comments on this change. > > Change subject: IMPALA-2550: Switch to per-query exec rpc > ...................................................................... > > > Patch Set 6: > > (14 comments) > > I quickly looked at some familiar files. > > http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/ > runtime/coordinator-backend-state.cc > File be/src/runtime/coordinator-backend-state.cc: > > PS6, Line 46: using boost::lock_guard; > : using boost::mutex; > #include "common/names.h" for these. > > > PS6, Line 348: true > the class comment says that Cancel() returns true if the cancellation > happened - but it almost certainly didn't here. > > > http://gerrit.cloudera.org:8080/#/c/6535/2/be/src/runtime/coordinator.cc > File be/src/runtime/coordinator.cc: > > PS2, Line 252: reate BackendStates > : bool has_coord_fragment = schedule_.GetCoordFragment() != > nullptr; > : const TNetworkAddress& coord_address = ExecE > doesn't need to be two statements. > > > PS2, Line 262: kend_state->Init(entry.second, fra > const &? > > > PS2, Line 264: > This would be better named coord_backend_idx_. > > > PS2, Line 1000: // debugging aid for back > Acquiring this lock is problematic for an RPC handler. See <TODO>. > > > http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator.cc > File be/src/runtime/coordinator.cc: > > PS6, Line 408: // TODO: rename this metric > Why not do this now? (But note that for the moment that name isn't > published in the profile anywhere). > > > PS6, Line 415: || status.IsCancelled() > When would we have a backend that was cancelled at this point (since > cancellation always originates from the coordinator)? And if there was a > cancelled backend, why overwrite its status with the error state? From a > user perspective, if they cancel the query, surfacing an error would be > confusing. > > > PS6, Line 858: Report aggregate > : // query profiles at this point. > This comment seems out-of-sync with the rename of ReportQuerySummary(). > > > PS6, Line 885: TODO: do we need this error propagation path, in addition > to the status report > : // we'll get anyway? > This error is needed so that client-facing operations can fail-fast. The > next coordinator status report might be seconds away. > > > PS6, Line 986: if (!(returned_all_results_ && status.IsCancelled()) && > !status.ok()) { > Will you include the fix for IMPALA-4925 here as discussed? (see > https://gerrit.cloudera.org/#/c/5987/1/be/src/runtime/coordinator.cc) > > > PS6, Line 993: lock_guard<mutex> l(lock_); > We spoke offline about removing this lock acquisition in this patch (see > https://gerrit.cloudera.org/#/c/5971/2 for my patch a month or so ago). > This is too expensive to take in an RPC handler - when a query completes > this can lead to significant convoying on this lock, and with KRPC the > reactor threads can get serialized (this would still be an issue even if > the reports were batched). Do you want to tackle this in this patch or in a > follow-on? > > > http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.cc > File be/src/runtime/query-state.cc: > > PS6, Line 199: // TODO: what to do with this? should we treat a failure to > get a backend > : // connection as a failure of instance execution? > Yes. > > The report RPC from a backend to the coordinator is how the backend tells > if it's orphaned from the coordinator. That should cause the backend to > fail, because it can't receive cancellation messages, and might otherwise > block forever. > > We CancelInternal() later if the RPC itself fails. This is just another > way that the RPC can logically fail. KRPC abstracts this away. > > > http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.h > File be/src/runtime/query-state.h: > > PS6, Line 65: - guarantee delivery of the ReportExecStatus rpc in case of > an error, otherwise > : /// the query might hang > We can't guarantee delivery - not quite sure what you mean here. > If the RPC can't be sent, the query state should get torn down on the > assumption that the coordinator has failed. > > > -- > To view, visit http://gerrit.cloudera.org:8080/6535 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd > Gerrit-PatchSet: 6 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Marcel Kornacker <[email protected]> > Gerrit-Reviewer: Dan Hecht <[email protected]> > Gerrit-Reviewer: Henry Robinson <[email protected]> > Gerrit-Reviewer: Marcel Kornacker <[email protected]> > Gerrit-Reviewer: Michael Ho <[email protected]> > Gerrit-Reviewer: Tim Armstrong <[email protected]> > Gerrit-HasComments: Yes > > -- > You received this message because you are subscribed to the Google Groups > "impala-cr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/a/cloudera.com/d/optout. > -- Henry Robinson Software Engineer Cloudera 415-994-6679
