Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14677 )
Change subject: IMPALA-9137: Blacklist node if a DataStreamService RPC to the node fails ...................................................................... Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/14677/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14677/10//COMMIT_MSG@18 PS10, Line 18: the target node. > For coordinator C and executors S and T, this means that if S fails to send S won't blacklist T since blacklists are only for Coordinators. Yeah, I think there is a concern of where to assign "blame" when an RPC fails. If an RPC from S to T fails, this patch currently blacklists T, but it is possible that the actual issue is with S. This is potentially problematic, if an RPC from S to T fails and the issue is with S, then T will be blacklisted for no reason, and the unhealthy node S will still be part of the cluster. I've done a few experiments where I kill T and errors from S are: ERROR: TransmitData() to 10.65.30.141:27000 failed: Network error: recv got EOF from 10.65.30.141:27000 (error 108) ERROR: TransmitData() to 10.65.29.251:27000 failed: Network error: recv error from 0.0.0.0:0: Transport endpoint is not connected (error 107) ERROR: TransmitData() to 10.65.26.254:27000 failed: Network error: Client connection negotiation failed: client connection to 10.65.26.254:27000: connect: Connection refused (error 111) ERROR: EndDataStream() to 127.0.0.1:27002 failed: Network error: recv error from 0.0.0.0:0: Transport endpoint is not connected (error 107) Maybe there is a way to make the blacklisting more specific to these exact errors. Let me see what I can find. -- To view, visit http://gerrit.cloudera.org:8080/14677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c Gerrit-Change-Number: 14677 Gerrit-PatchSet: 10 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Wed, 11 Dec 2019 19:01:53 +0000 Gerrit-HasComments: Yes
