[ 
https://issues.apache.org/jira/browse/GEODE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Lindsey resolved GEODE-9666.
----------------------------------
    Fix Version/s: 1.15.0
       Resolution: Fixed

> Client throws NoAvailableLocatorsException after locators change IP addresses
> -----------------------------------------------------------------------------
>
>                 Key: GEODE-9666
>                 URL: https://issues.apache.org/jira/browse/GEODE-9666
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>    Affects Versions: 1.15.0
>            Reporter: Aaron Lindsey
>            Assignee: Aaron Lindsey
>            Priority: Major
>              Labels: needsTriage, pull-request-available
>             Fix For: 1.15.0
>
>
> We have a test for Geode on Kubernetes which:
>  * Deploys a Geode cluster consisting of 2 locator Pods, 3 server Pods
>  * Deploys 5 Spring boot client Pods which continually do PUTs and GETs
>  * Triggers a rolling restart of the locator Pods
>  ** The rolling restart operation restarts one locator at a time, waiting for 
> each restarted locator to become fully online before restarting the next 
> locator
>  * Stops the client operations and validates there were no exceptions thrown 
> in the clients.
> Occasionally, we see {{NoAvailableLocatorsException}} thrown on one of the 
> clients:
> {code:none}
> org.apache.geode.cache.client.NoAvailableLocatorsException: Unable to connect 
> to any locators in the list 
> [system-test-gemfire-locator-0.system-test-gemfire-locator.gemfire-system-test-3f1ecc74-b1ea-4288-b4d1-594bbb8364ab.svc.cluster.local:10334,
>  
> system-test-gemfire-locator-1.system-test-gemfire-locator.gemfire-system-test-3f1ecc74-b1ea-4288-b4d1-594bbb8364ab.svc.cluster.local:10334]
>       at 
> org.apache.geode.cache.client.internal.AutoConnectionSourceImpl.findServer(AutoConnectionSourceImpl.java:174)
>       at 
> org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:198)
>       at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:196)
>       at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:190)
>       at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:276)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:136)
>       at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:119)
>       at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:801)
>       at org.apache.geode.cache.client.internal.GetOp.execute(GetOp.java:92)
>       at 
> org.apache.geode.cache.client.internal.ServerRegionProxy.get(ServerRegionProxy.java:114)
>       at 
> org.apache.geode.internal.cache.LocalRegion.findObjectInSystem(LocalRegion.java:2802)
>       at 
> org.apache.geode.internal.cache.LocalRegion.getObject(LocalRegion.java:1469)
>       at 
> org.apache.geode.internal.cache.LocalRegion.nonTxnFindObject(LocalRegion.java:1442)
>       at 
> org.apache.geode.internal.cache.LocalRegionDataView.findObject(LocalRegionDataView.java:197)
>       at 
> org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1379)
>       at 
> org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1318)
>       at 
> org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1303)
>       at 
> org.apache.geode.internal.cache.AbstractRegion.get(AbstractRegion.java:439)
>       at 
> org.apache.geode.kubernetes.client.service.AsyncOperationService.evaluate(AsyncOperationService.java:282)
>       at 
> org.apache.geode.kubernetes.client.api.Controller.evaluateRegion(Controller.java:88)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>       at 
> org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:197)
>       at 
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:141)
>       at 
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:106)
>       at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:894)
>       at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808)
>       at 
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
>       at 
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1063)
>       at 
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:963)
>       at 
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
>       at 
> org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:626)
>       at 
> org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:733)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
>       at 
> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
>       at 
> org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
>       at 
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
>       at 
> org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
>       at 
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
>       at 
> org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
>       at 
> org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
>       at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
>       at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
>       at 
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:542)
>       at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:143)
>       at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
>       at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
>       at 
> org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:764)
>       at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:357)
>       at 
> org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:374)
>       at 
> org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
>       at 
> org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893)
>       at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1707)
>       at 
> org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
> We do not expect any of the clients to throw {{NoAvailableLocatorsException}} 
> because there is always at least one locator available during the test.
> We did some investigation and found that:
>  * Locator Pods get different IP addresses on Kubernetes after they are 
> restarted, but they keep the same hostname.
>  * After we see the {{NoAvailableLocatorsException}} thrown from a client, we 
> see the client continues trying to contact the locators using stale IP 
> addresses (i.e. the locators' original IP addresses from before they were 
> restarted). We checked that the locators' DNS names are resolvable to the 
> correct IP addresses from within the locator containers. We also ruled out 
> the as [JVM DNS cache 
> settings|https://docs.oracle.com/javase/7/docs/technotes/guides/net/properties.html]
>  as the cause of the stale IP addresses.
>  * The changes for GEODE-9139 changed the behavior of 
> {{org.apache.geode.distributed.internal.tcpserver.HostAndPort}} to 
> permanently cache the resolved address once it has tried one time. This 
> undoes part of the fix introduced by GEODE-7808, in which HostAndPort was 
> created as a way to hold an unresolved hostname.
> In order to fix this issue, it seems like 
> {{org.apache.geode.distributed.internal.tcpserver.HostAndPort}} should be 
> changed so that when it contains an unresolved address, it will try to 
> resolve the address each time {{getSocketInetAddress}} is called. This was 
> the behavior in Geode 1.13 and 1.14, so changing it back shouldn't have a 
> negative impact on performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to