Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14677 )

Change subject: IMPALA-9137: Blacklist node if a DataStreamService RPC to the 
node fails
......................................................................


Patch Set 12:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.h@152
PS12, Line 152:   void UpdateBlacklistWithAuxErrorInfo(const 
ReportExecStatusRequestPB& request);
> This can be private
Done


http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.h@263
PS12, Line 263:   boost::unordered_map<TNetworkAddress, BackendState*> 
addr_to_backend_state_;
> I think we might already have this mapping in QuerySchedule::per_backend_ex
Yeah, but that mapping uses Thrift address rather than KRPC addresses. Updated 
the docs to make this more clear.


http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.cc@862
PS12, Line 862:         LOG(WARNING) << "Query failed due to a failed RPC to an 
unknown target address "
> Doesn't seem like this case should be possible, since 'addr_to_backend_stat
I added a DCHECK(false) and changed the log level to ERROR.


http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.cc@879
PS12, Line 879:       static const set<int32_t> blacklistable_rpc_error_codes = 
{
> Do you examples of posix codes a NetworkError could generate that we wouldn
Discussed offline. There are a bunch of error codes 
(https://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/Errors/unix_system_errors.html)
 that don't seem useful to blacklist. I'm not sure what set of error codes KRPC 
is even possible of setting for network errors. Regardless, its probably safer 
to selectively add error codes to this list as we see fit.


http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.cc@882
PS12, Line 882: ECONNREFUSED
> I think you can specify these directly instead of having to use the actual
Done


http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/coordinator.cc@892
PS12, Line 892: break;
> Was it your intention to not continue through the loop if the first RPCErro
Done


http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/14677/12/be/src/runtime/runtime-state.h@439
PS12, Line 439:   struct AuxErrorInfo {
> Why not just use AuxErrorInfoPB directly?
Done



--
To view, visit http://gerrit.cloudera.org:8080/14677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c
Gerrit-Change-Number: 14677
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Thu, 19 Dec 2019 03:51:47 +0000
Gerrit-HasComments: Yes

Reply via email to