Aaron Lindsey created GEODE-9666:
------------------------------------

             Summary: Client throws NoAvailableLocatorsException after locators 
change IP addresses
                 Key: GEODE-9666
                 URL: https://issues.apache.org/jira/browse/GEODE-9666
             Project: Geode
          Issue Type: Bug
          Components: membership
    Affects Versions: 1.15.0
            Reporter: Aaron Lindsey


We have a test for Geode on Kubernetes which:
 * Deploys a Geode cluster consisting of 2 locator Pods, 3 server Pods
 * Deploys 5 Spring boot client Pods which continually do PUTs and GETs
 * Triggers a rolling restart of the locator Pods
 ** The rolling restart operation restarts one locator at a time, waiting for 
each restarted locator to become fully online before restarting the next locator
 * Stops the client operations and validates there were no exceptions thrown in 
the clients.

Occasionally, we see {{NoAvailableLocatorsException}} thrown on one of the 
clients:

{code:none}
org.apache.geode.cache.client.NoAvailableLocatorsException: Unable to connect 
to any locators in the list 
[system-test-gemfire-locator-0.system-test-gemfire-locator.gemfire-system-test-3f1ecc74-b1ea-4288-b4d1-594bbb8364ab.svc.cluster.local:10334,
 
system-test-gemfire-locator-1.system-test-gemfire-locator.gemfire-system-test-3f1ecc74-b1ea-4288-b4d1-594bbb8364ab.svc.cluster.local:10334]
        at 
org.apache.geode.cache.client.internal.AutoConnectionSourceImpl.findServer(AutoConnectionSourceImpl.java:174)
        at 
org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:198)
        at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:196)
        at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:190)
        at 
org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:276)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:136)
        at 
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:119)
        at 
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:801)
        at org.apache.geode.cache.client.internal.GetOp.execute(GetOp.java:92)
        at 
org.apache.geode.cache.client.internal.ServerRegionProxy.get(ServerRegionProxy.java:114)
        at 
org.apache.geode.internal.cache.LocalRegion.findObjectInSystem(LocalRegion.java:2802)
        at 
org.apache.geode.internal.cache.LocalRegion.getObject(LocalRegion.java:1469)
        at 
org.apache.geode.internal.cache.LocalRegion.nonTxnFindObject(LocalRegion.java:1442)
        at 
org.apache.geode.internal.cache.LocalRegionDataView.findObject(LocalRegionDataView.java:197)
        at 
org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1379)
        at 
org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1318)
        at 
org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1303)
        at 
org.apache.geode.internal.cache.AbstractRegion.get(AbstractRegion.java:439)
        at 
org.apache.geode.kubernetes.client.service.AsyncOperationService.evaluate(AsyncOperationService.java:282)
        at 
org.apache.geode.kubernetes.client.api.Controller.evaluateRegion(Controller.java:88)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:197)
        at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:141)
        at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:106)
        at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:894)
        at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808)
        at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
        at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1063)
        at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:963)
        at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
        at 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:626)
        at 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:733)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
        at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
        at 
org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
        at 
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
        at 
org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
        at 
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
        at 
org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
        at 
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
        at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:542)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:143)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
        at 
org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:764)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:357)
        at 
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:374)
        at 
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
        at 
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893)
        at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1707)
        at 
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.base/java.lang.Thread.run(Thread.java:829)
{code}

We do not expect any of the clients to throw {{NoAvailableLocatorsException}} 
because there is always at least one locator available during the test.

We did some investigation and found that:
 * Locator Pods get different IP addresses on Kubernetes after they are 
restarted, but they keep the same hostname.
 * After we see the {{NoAvailableLocatorsException}} thrown from a client, we 
see the client continues trying to contact the locators using stale IP 
addresses (i.e. the locators' original IP addresses from before they were 
restarted). We checked that the locators' DNS names are resolvable to the 
correct IP addresses from within the locator containers. We also ruled out the 
as [JVM DNS cache 
settings|https://docs.oracle.com/javase/7/docs/technotes/guides/net/properties.html]
 as the cause of the stale IP addresses.
 * The changes for GEODE-9139 changed the behavior of 
{{org.apache.geode.distributed.internal.tcpserver.HostAndPort}} to permanently 
cache the resolved address once it has tried one time. This undoes part of the 
fix introduced by GEODE-7808, in which HostAndPort was created as a way to hold 
an unresolved hostname.

In order to fix this issue, it seems like 
{{org.apache.geode.distributed.internal.tcpserver.HostAndPort}} should be 
changed so that when it contains an unresolved address, it will try to resolve 
the address each time {{getSocketInetAddress}} is called. This was the behavior 
in Geode 1.13 and 1.14, so changing it back shouldn't have a negative impact on 
performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to