Re: Volatile Kubernetes Node Discovery

keinproblem Wed, 03 May 2017 00:04:51 -0700

Denis Magda-2 wrote
>> Inside my service I'm using a IgniteCache in /Replicated/ mode from
>> Ignite
>> 1.9.
>> Some replicas of this service run inside Kubernetes in form of Pods (1
>> Container/Pod).
>> I'm using the 
>> /org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder/
>> for the Node Discovery.
> 
> Do you mean that a part of the cluster is running outside of Kubernetes?
> If it’s so this might be an issue because containerized Ignite nodes can’t
> get trough the network and reach out your nodes that are outside.
> 
> —
> Denis
> 
>> On May 2, 2017, at 12:20 PM, keinproblem &lt;


> noli.m@

> &gt; wrote:
>> 
>> Dear Apache Ignite Users Community,
>> 
>> This may be a well-known problem, although the currently available
>> information does not provide enough help for solving this issue.
>> 
>> Inside my service I'm using a IgniteCache in /Replicated/ mode from
>> Ignite
>> 1.9.
>> Some replicas of this service run inside Kubernetes in form of Pods (1
>> Container/Pod).
>> I'm using the 
>> /org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder/
>> for the Node Discovery.
>> As I understood: each Pod is able to make an API Call to the Kubernetes
>> API
>> and retrieve the list of currently available nodes. This works properly.
>> Even though the Pod's own IP will also be retrieved, which produces a
>> somehow harmless 
>> 
>> Here is how I get my /IgniteCache/ the used /IgniteConfiguration/
>> information:
>> 
>>    public IgniteCache&lt;String,MyCacheObject&gt; getCacheInstance(){
>>        final CacheConfiguration&lt;String,Tenant&gt; cacheConfiguration =
>> new
>> CacheConfiguration<>();
>>        cacheConfiguration.setName("MyObjectCache");
>>        return ignite.getOrCreateCache(cacheConfiguration);
>>    }
>> 
>>    public static IgniteConfiguration getDefaultIgniteConfiguration(){
>>        final IgniteConfiguration cfg = new IgniteConfiguration();
>>        cfg.setGridLogger(new Slf4jLogger(log));
>>        cfg.setClientMode(false);
>> 
>>        final TcpDiscoveryKubernetesIpFinder kubernetesPodIpFinder = new
>> TcpDiscoveryKubernetesIpFinder();
>> 
>> kubernetesPodIpFinder.setServiceName(SystemDataProvider.getServiceNameEnv);
>>        final TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
>> 
>> 
>>        tcpDiscoverySpi.setIpFinder(kubernetesPodIpFinder);
>>        tcpDiscoverySpi.setLocalPort(47500);        //using a static port,
>> to decrease potential failure causes
>>        cfg.setFailureDetectionTimeout(90000);
>>        cfg.setDiscoverySpi(tcpDiscoverySpi);
>>        return cfg;
>>    }
>> 
>> 
>> 
>> The initial node will start up properly every time.
>> 
>> In most cases, the ~ 3rd node trying to connect will fail and gets
>> restarted
>> by Kubernetes after some time. Sometimes this node will succeed in
>> connecting to the cluster after a few restarts, but the common case is
>> that
>> the nodes will keep restarting forever.
>> 
>> But the major issue is that when a new node fails to connect to the
>> cluster,
>> the cluster seems to become unstable: the number of nodes increases for a
>> very short time, then drops to the previous count or even lower.
>> I am not sure if those are the new connecting nodes loosing the
>> connection
>> immediately again, or if the previous successfully connected nodes loose
>> connection.
>> 
>> 
>> I also deployed the bare Ignite Docker Image including a configuration
>> for
>> the 
>> /TcpDiscoveryKubernetesIpFinder/ as described here 
>> https://apacheignite.readme.io/docs/kubernetes-deployment
>> &lt;https://apacheignite.readme.io/docs/kubernetes-deployment&gt;  . 
>> Even with this minimal setup, I've experienced the same behavior.
>> 
>> There is no load on the Ignite Nodes and the network usage is very low.
>> 
>> Using another Kubernetes instance on another infrastructure showed the
>> same
>> results, hence I assume this to be an Ignite related issue.
>> 
>> What I also tried is, increasing the specific time-outs like
>> /ackTimeout/,
>> /sockTimeout/ etc.
>> 
>> Also using the /TcpDiscoveryVmIpFinder/ did not help. Where I got all the
>> endpoints via DNS.
>> Same behavior as described inb4.
>> 
>> Please find attached a log file providing information on WARN level.
>> Please
>> let me know if DEBUG level is desired.
>> 
>> 
>> 
>> Kind regards and thanks in advance,
>> keinproblem
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-ignite-users.70518.x6.nabble.com/Volatile-Kubernetes-Node-Discovery-tp12357.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Hi Denis,

the whole cluster is running in Kubernetes.
So basically I just have connections between my pods.

Kind regards,
keinproblem



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Volatile-Kubernetes-Node-Discovery-tp12357p12373.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Volatile Kubernetes Node Discovery

Reply via email to