Aleksey Plekhanov created IGNITE-21630:
------------------------------------------

             Summary: Cluster falls apart on topology change when DNS service 
is unavailable 
                 Key: IGNITE-21630
                 URL: https://issues.apache.org/jira/browse/IGNITE-21630
             Project: Ignite
          Issue Type: Bug
            Reporter: Aleksey Plekhanov
            Assignee: Aleksey Plekhanov


Requests to DNS service performed synchroniously by some critical discovery 
threads. Timeout for such requests can't be controlled by java code (see 
[https://bugs.openjdk.org/browse/JDK-6450279]). This leads to segmentation of 
nodes and falling apart cluster.

For example, stack of {{tcp-disco-msg-worker}} thread with request to DNS 
service:
{noformat}
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1330)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1283)
    at java.net.InetAddress.getAllByName(InetAddress.java:1199)
    at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    at java.net.InetAddress.getByName(InetAddress.java:1077)
    at java.net.InetSocketAddress.<init>(InetSocketAddress.java:220)
    at 
org.apache.ignite.internal.util.IgniteUtils.createResolved(IgniteUtils.java:9829)
    at 
org.apache.ignite.internal.util.IgniteUtils.toSocketAddresses(IgniteUtils.java:9792)
    at 
org.apache.ignite.internal.util.IgniteUtils.toSocketAddresses(IgniteUtils.java:9770)
    at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode.socketAddresses(TcpDiscoveryNode.java:392)
    at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.getNodeAddresses(TcpDiscoverySpi.java:1267)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.interruptPing(ServerImpl.java:985)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.access$6800(ServerImpl.java:206)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeLeftMessage(ServerImpl.java:5433)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3221)
    at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2894)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to