Elek, Marton created HDDS-421:
---------------------------------

             Summary: Resilient DNS resolution in datanode-service 
                 Key: HDDS-421
                 URL: https://issues.apache.org/jira/browse/HDDS-421
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: Ozone Datanode
            Reporter: Elek, Marton
            Assignee: Elek, Marton
             Fix For: 0.2.1


When I start big clusters on kubernetes I got a very typical error:

If the DNS of the scm is not yet available during the bootup of the datanode: 
the datanode won't connect to the scm. It tries to reconnect but the dns 
resolution is not repeated.

The problem is in the InitDatanodeState.call(). It calls the getSCMAddresses 
which creates the InetSocketAddress-es with using the hadoop utilities. During 
the creation of the InetSocketAddress the hadoop utilities try to resolve the 
address and save the result to the InetSocketAddress.

The address could be unresolved, but the InitDatanodeState.call will start to 
use it (connectionManager.addSCMServer) and there won't be any attempt to 
resolve it later.

My small proposal is to return immediately of any of the scm addresses is 
unresolved and the main loop of the DatanodeStateMachine will try it again 
(together with the DNS resolution part).






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to