Re: [Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

Henry Robinson Mon, 17 Apr 2017 16:43:16 -0700

Oops, ignore the outdated PS2 comments. The PS6 ones are current though!

On 17 April 2017 at 16:41, Henry Robinson (Code Review) <[email protected]
> wrote:


> Henry Robinson has posted comments on this change.
>
> Change subject: IMPALA-2550: Switch to per-query exec rpc
> ......................................................................
>
>
> Patch Set 6:
>
> (14 comments)
>
> I quickly looked at some familiar files.
>
> http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/
> runtime/coordinator-backend-state.cc
> File be/src/runtime/coordinator-backend-state.cc:
>
> PS6, Line 46: using boost::lock_guard;
>             : using boost::mutex;
> #include "common/names.h" for these.
>
>
> PS6, Line 348: true
> the class comment says that Cancel() returns true if the cancellation
> happened - but it almost certainly didn't here.
>
>
> http://gerrit.cloudera.org:8080/#/c/6535/2/be/src/runtime/coordinator.cc
> File be/src/runtime/coordinator.cc:
>
> PS2, Line 252: reate BackendStates
>              :   bool has_coord_fragment = schedule_.GetCoordFragment() !=
> nullptr;
>              :   const TNetworkAddress& coord_address = ExecE
> doesn't need to be two statements.
>
>
> PS2, Line 262: kend_state->Init(entry.second, fra
> const &?
>
>
> PS2, Line 264:
> This would be better named coord_backend_idx_.
>
>
> PS2, Line 1000:   // debugging aid for back
> Acquiring this lock is problematic for an RPC handler. See <TODO>.
>
>
> http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator.cc
> File be/src/runtime/coordinator.cc:
>
> PS6, Line 408: // TODO: rename this metric
> Why not do this now? (But note that for the moment that name isn't
> published in the profile anywhere).
>
>
> PS6, Line 415: || status.IsCancelled()
> When would we have a backend that was cancelled at this point (since
> cancellation always originates from the coordinator)? And if there was a
> cancelled backend, why overwrite its status with the error state? From a
> user perspective, if they cancel the query, surfacing an error would be
> confusing.
>
>
> PS6, Line 858: Report aggregate
>              :   // query profiles at this point.
> This comment seems out-of-sync with the rename of ReportQuerySummary().
>
>
> PS6, Line 885: TODO: do we need this error propagation path, in addition
> to the status report
>              :   // we'll get anyway?
> This error is needed so that client-facing operations can fail-fast. The
> next coordinator status report might be seconds away.
>
>
> PS6, Line 986: if (!(returned_all_results_ && status.IsCancelled()) &&
> !status.ok()) {
> Will you include the fix for IMPALA-4925 here as discussed? (see
> https://gerrit.cloudera.org/#/c/5987/1/be/src/runtime/coordinator.cc)
>
>
> PS6, Line 993: lock_guard<mutex> l(lock_);
> We spoke offline about removing this lock acquisition in this patch (see
> https://gerrit.cloudera.org/#/c/5971/2 for my patch a month or so ago).
> This is too expensive to take in an RPC handler - when a query completes
> this can lead to significant convoying on this lock, and with KRPC the
> reactor threads can get serialized (this would still be an issue even if
> the reports were batched). Do you want to tackle this in this patch or in a
> follow-on?
>
>
> http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.cc
> File be/src/runtime/query-state.cc:
>
> PS6, Line 199: // TODO: what to do with this? should we treat a failure to
> get a backend
>              :     // connection as a failure of instance execution?
> Yes.
>
> The report RPC from a backend to the coordinator is how the backend tells
> if it's orphaned from the coordinator. That should cause the backend to
> fail, because it can't receive cancellation messages, and might otherwise
> block forever.
>
> We CancelInternal() later if the RPC itself fails. This is just another
> way that the RPC can logically fail. KRPC abstracts this away.
>
>
> http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.h
> File be/src/runtime/query-state.h:
>
> PS6, Line 65:  - guarantee delivery of the ReportExecStatus rpc in case of
> an error, otherwise
>             : ///   the query might hang
> We can't guarantee delivery - not quite sure what you mean here.
> If the RPC can't be sent, the query state should get torn down on the
> assumption that the coordinator has failed.
>
>
> --
> To view, visit http://gerrit.cloudera.org:8080/6535
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
> Gerrit-PatchSet: 6
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Marcel Kornacker <[email protected]>
> Gerrit-Reviewer: Dan Hecht <[email protected]>
> Gerrit-Reviewer: Henry Robinson <[email protected]>
> Gerrit-Reviewer: Marcel Kornacker <[email protected]>
> Gerrit-Reviewer: Michael Ho <[email protected]>
> Gerrit-Reviewer: Tim Armstrong <[email protected]>
> Gerrit-HasComments: Yes
>
> --
> You received this message because you are subscribed to the Google Groups
> "impala-cr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/a/cloudera.com/d/optout.
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: [Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

Reply via email to