Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14677 )

Change subject: IMPALA-9137: Blacklist node if a DataStreamService RPC to the 
node fails
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14677/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14677/10//COMMIT_MSG@18
PS10, Line 18: the target node.
> For coordinator C and executors S and T, this means that if S fails to send
S won't blacklist T since blacklists are only for Coordinators.
Yeah, I think there is a concern of where to assign "blame" when an RPC fails. 
If an RPC from S to T fails, this patch currently blacklists T, but it is 
possible that the actual issue is with S. This is potentially problematic, if 
an RPC from S to T fails and the issue is with S, then T will be blacklisted 
for no reason, and the unhealthy node S will still be part of the cluster.
I've done a few experiments where I kill T and errors from S are:

 ERROR: TransmitData() to 10.65.30.141:27000 failed: Network error: recv got 
EOF from 10.65.30.141:27000 (error 108)
 ERROR: TransmitData() to 10.65.29.251:27000 failed: Network error: recv error 
from 0.0.0.0:0: Transport endpoint is not connected (error 107)
 ERROR: TransmitData() to 10.65.26.254:27000 failed: Network error: Client 
connection negotiation failed: client connection to 10.65.26.254:27000: 
connect: Connection refused (error 111)
 ERROR: EndDataStream() to 127.0.0.1:27002 failed: Network error: recv error 
from 0.0.0.0:0: Transport endpoint is not connected (error 107)

Maybe there is a way to make the blacklisting more specific to these exact 
errors. Let me see what I can find.



--
To view, visit http://gerrit.cloudera.org:8080/14677
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c
Gerrit-Change-Number: 14677
Gerrit-PatchSet: 10
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Wed, 11 Dec 2019 19:01:53 +0000
Gerrit-HasComments: Yes

Reply via email to