Henry Robinson has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
......................................................................


Patch Set 6:

(14 comments)

I quickly looked at some familiar files.

http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

PS6, Line 46: using boost::lock_guard;
            : using boost::mutex;
#include "common/names.h" for these.


PS6, Line 348: true
the class comment says that Cancel() returns true if the cancellation happened 
- but it almost certainly didn't here.


http://gerrit.cloudera.org:8080/#/c/6535/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS2, Line 252: reate BackendStates
             :   bool has_coord_fragment = schedule_.GetCoordFragment() != 
nullptr;
             :   const TNetworkAddress& coord_address = ExecE
doesn't need to be two statements.


PS2, Line 262: kend_state->Init(entry.second, fra
const &?


PS2, Line 264: 
This would be better named coord_backend_idx_.


PS2, Line 1000:   // debugging aid for back
Acquiring this lock is problematic for an RPC handler. See <TODO>.


http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS6, Line 408: // TODO: rename this metric
Why not do this now? (But note that for the moment that name isn't published in 
the profile anywhere).


PS6, Line 415: || status.IsCancelled()
When would we have a backend that was cancelled at this point (since 
cancellation always originates from the coordinator)? And if there was a 
cancelled backend, why overwrite its status with the error state? From a user 
perspective, if they cancel the query, surfacing an error would be confusing.


PS6, Line 858: Report aggregate
             :   // query profiles at this point.
This comment seems out-of-sync with the rename of ReportQuerySummary().


PS6, Line 885: TODO: do we need this error propagation path, in addition to the 
status report
             :   // we'll get anyway?
This error is needed so that client-facing operations can fail-fast. The next 
coordinator status report might be seconds away.


PS6, Line 986: if (!(returned_all_results_ && status.IsCancelled()) && 
!status.ok()) {
Will you include the fix for IMPALA-4925 here as discussed? (see 
https://gerrit.cloudera.org/#/c/5987/1/be/src/runtime/coordinator.cc)


PS6, Line 993: lock_guard<mutex> l(lock_);
We spoke offline about removing this lock acquisition in this patch (see 
https://gerrit.cloudera.org/#/c/5971/2 for my patch a month or so ago).  This 
is too expensive to take in an RPC handler - when a query completes this can 
lead to significant convoying on this lock, and with KRPC the reactor threads 
can get serialized (this would still be an issue even if the reports were 
batched). Do you want to tackle this in this patch or in a follow-on?


http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS6, Line 199: // TODO: what to do with this? should we treat a failure to get 
a backend
             :     // connection as a failure of instance execution?
Yes. 

The report RPC from a backend to the coordinator is how the backend tells if 
it's orphaned from the coordinator. That should cause the backend to fail, 
because it can't receive cancellation messages, and might otherwise block 
forever. 

We CancelInternal() later if the RPC itself fails. This is just another way 
that the RPC can logically fail. KRPC abstracts this away.


http://gerrit.cloudera.org:8080/#/c/6535/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

PS6, Line 65:  - guarantee delivery of the ReportExecStatus rpc in case of an 
error, otherwise
            : ///   the query might hang
We can't guarantee delivery - not quite sure what you mean here. 
If the RPC can't be sent, the query state should get torn down on the 
assumption that the coordinator has failed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Henry Robinson <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to