Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14677 )

Change subject: IMPALA-9137: Blacklist node if a DataStreamService RPC to the 
node fails
......................................................................


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/krpc-data-stream-sender.cc@406
PS9, Line 406: unique_ptr
Not sure that it's necessary to use a unique_ptr and move() here - the 
StatusAuxInfo object is pretty small, so its not a big deal if RuntimeState 
just copies it.


http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/runtime-state.h@306
PS9, Line 306:   StatusAuxInfo& status_aux_info() {
I guess you're relying on the lock in StatusAuxInfo to make this thread-safe? I 
think we usually try to avoid exposing class members to be modified in this 
way, it's better encapsulation to use something like SetStatusAuxInfo(). Then 
you could add a 'status_aux_info_lock_', and it would be consistent with how we 
handle the error log, eg. with 'error_log_lock_'.

You might even then get rid of StatusAuxInfo entirely and just directly use the 
StatusAuxInfoPB objects, since StatusAuxInfo isn't really doing much.


http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/status-aux-info.h
File be/src/runtime/status-aux-info.h:

http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/status-aux-info.h@61
PS9, Line 61: StatusAuxInfoPB* CreateStatusAuxInfoPB()
Seems sort of weird to create a 'new StatusAuxInfoPB()" and then use 
set_allocated_status_aux_info().

I think the way we usually do this is to make this something like 
ToProtobuf(StatusAuxInfoPB* status_info), and then pass 
mutable_status_aux_info() in.


http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto@155
PS9, Line 155: // Metadata that can be associated with a StatusPB object.
Might be useful to explain this a little more, eg. something like "Used to 
store extra info about errors encountered during fragment execution that the 
coordinator may need to handle, such as indicating that a node may be bad and 
should be blacklisted"


http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto@184
PS9, Line 184: optional StatusAuxInfoPB status_aux_info = 7;
Might make more sense to put this into FragmentInstanceExecStatusPB, since at 
least in this patch the errors that we're handling are specific to particular 
fragment instances.

This also makes it easier to handle the case where multiple fragments may have 
failed. As is, if multiple fragments fail for different reasons, you'll just 
drop any StatusAuxInfoPBs that don't correspond to failed_finstance_id_.


http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto@289
PS9, Line 289: }
?



--
To view, visit http://gerrit.cloudera.org:8080/14677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c
Gerrit-Change-Number: 14677
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Fri, 06 Dec 2019 23:31:08 +0000
Gerrit-HasComments: Yes

Reply via email to