Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14677 )
Change subject: IMPALA-9137: Blacklist node if a DataStreamService RPC to the node fails ...................................................................... Patch Set 9: (6 comments) http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/krpc-data-stream-sender.cc@406 PS9, Line 406: unique_ptr Not sure that it's necessary to use a unique_ptr and move() here - the StatusAuxInfo object is pretty small, so its not a big deal if RuntimeState just copies it. http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/runtime-state.h@306 PS9, Line 306: StatusAuxInfo& status_aux_info() { I guess you're relying on the lock in StatusAuxInfo to make this thread-safe? I think we usually try to avoid exposing class members to be modified in this way, it's better encapsulation to use something like SetStatusAuxInfo(). Then you could add a 'status_aux_info_lock_', and it would be consistent with how we handle the error log, eg. with 'error_log_lock_'. You might even then get rid of StatusAuxInfo entirely and just directly use the StatusAuxInfoPB objects, since StatusAuxInfo isn't really doing much. http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/status-aux-info.h File be/src/runtime/status-aux-info.h: http://gerrit.cloudera.org:8080/#/c/14677/9/be/src/runtime/status-aux-info.h@61 PS9, Line 61: StatusAuxInfoPB* CreateStatusAuxInfoPB() Seems sort of weird to create a 'new StatusAuxInfoPB()" and then use set_allocated_status_aux_info(). I think the way we usually do this is to make this something like ToProtobuf(StatusAuxInfoPB* status_info), and then pass mutable_status_aux_info() in. http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto@155 PS9, Line 155: // Metadata that can be associated with a StatusPB object. Might be useful to explain this a little more, eg. something like "Used to store extra info about errors encountered during fragment execution that the coordinator may need to handle, such as indicating that a node may be bad and should be blacklisted" http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto@184 PS9, Line 184: optional StatusAuxInfoPB status_aux_info = 7; Might make more sense to put this into FragmentInstanceExecStatusPB, since at least in this patch the errors that we're handling are specific to particular fragment instances. This also makes it easier to handle the case where multiple fragments may have failed. As is, if multiple fragments fail for different reasons, you'll just drop any StatusAuxInfoPBs that don't correspond to failed_finstance_id_. http://gerrit.cloudera.org:8080/#/c/14677/9/common/protobuf/control_service.proto@289 PS9, Line 289: } ? -- To view, visit http://gerrit.cloudera.org:8080/14677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c Gerrit-Change-Number: 14677 Gerrit-PatchSet: 9 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Fri, 06 Dec 2019 23:31:08 +0000 Gerrit-HasComments: Yes
