[GitHub] [hadoop-ozone] bharatviswa504 opened a new pull request #815: HDDS-3219. Write operation when both OM followers are shutdown.

GitBox Mon, 13 Apr 2020 16:25:50 -0700

bharatviswa504 opened a new pull request #815: HDDS-3219. Write operation when 
both OM followers are shutdown.
URL: https://github.com/apache/hadoop-ozone/pull/815
 
 
   ## What changes were proposed in this pull request?
   
   Added a new parameter for om rpc client time out. In this way, it will only 
affect OM Rpc Client.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3291
   
   ## How was this patch tested?
   
   Tested this on a docker cluster with the below settings. We should increase 
the timeout duration to a larger value so that OM will think it is the leader 
for a longer period even though it is not, and the request will be accepted by 
leader, and it will retry forever.
   OZONE-SITE.XML_ozone.om.client.rpc.timeout=30s
   OZONE-SITE.XML_ozone.om.leader.election.minimum.timeout.duration=1m
   
   Now with this patch, request fails after 15 retries. And for OM Server which 
it thinks it is leader, we get SocketTimeOutException, and move to next OM.
   
   Logs:
   ```
   2020-04-13 21:59:44,667 [main] INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om3":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy20.submitRequest over nodeId=om3,nodeAddress=om3:9862 after 13 failover 
attempts. Trying to failover immediately.
   2020-04-13 21:59:44,667 [main] INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid 
host name: local host is: (unknown); destination host is: "om1":9862; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost, while invoking 
$Proxy20.submitRequest over nodeId=om1,nodeAddress=om1:9862 after 14 failover 
attempts. Trying to failover immediately.
   2020-04-13 22:00:14,677 [main] INFO  RetryInvocationHandler:411 - 
com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call 
From 531e9bfac0d9/172.24.0.4 to om2:9862 failed on socket timeout exception: 
java.net.SocketTimeoutException: 30000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/172.24.0.4:47798 remote=om2/172.24.0.7:9862]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout, while invoking 
$Proxy20.submitRequest over nodeId=om2,nodeAddress=om2:9862 after 15 failover 
attempts. Trying to failover immediately.
   2020-04-13 22:00:14,678 [main] ERROR OMFailoverProxyProvider:286 - Failed to 
connect to OMs: [nodeId=om1,nodeAddress=om1:9862, 
nodeId=om3,nodeAddress=om3:9862, nodeId=om2,nodeAddress=om2:9862]. Attempted 15 
failover
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop-ozone] bharatviswa504 opened a new pull request #815: HDDS-3219. Write operation when both OM followers are shutdown.

Reply via email to