Denis Magda-2 wrote
>> Inside my service I'm using a IgniteCache in /Replicated/ mode from
>> Ignite
>> 1.9.
>> Some replicas of this service run inside Kubernetes in form of Pods (1
>> Container/Pod).
>> I'm using the
>> /org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder/
>> for the Node Discovery.
>
> Do you mean that a part of the cluster is running outside of Kubernetes?
> If it’s so this might be an issue because containerized Ignite nodes can’t
> get trough the network and reach out your nodes that are outside.
>
> —
> Denis
>
>> On May 2, 2017, at 12:20 PM, keinproblem <
> noli.m@
> > wrote:
>>
>> Dear Apache Ignite Users Community,
>>
>> This may be a well-known problem, although the currently available
>> information does not provide enough help for solving this issue.
>>
>> Inside my service I'm using a IgniteCache in /Replicated/ mode from
>> Ignite
>> 1.9.
>> Some replicas of this service run inside Kubernetes in form of Pods (1
>> Container/Pod).
>> I'm using the
>> /org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder/
>> for the Node Discovery.
>> As I understood: each Pod is able to make an API Call to the Kubernetes
>> API
>> and retrieve the list of currently available nodes. This works properly.
>> Even though the Pod's own IP will also be retrieved, which produces a
>> somehow harmless
>>
>> Here is how I get my /IgniteCache/ the used /IgniteConfiguration/
>> information:
>>
>> public IgniteCache<String,MyCacheObject> getCacheInstance(){
>> final CacheConfiguration<String,Tenant> cacheConfiguration =
>> new
>> CacheConfiguration<>();
>> cacheConfiguration.setName("MyObjectCache");
>> return ignite.getOrCreateCache(cacheConfiguration);
>> }
>>
>> public static IgniteConfiguration getDefaultIgniteConfiguration(){
>> final IgniteConfiguration cfg = new IgniteConfiguration();
>> cfg.setGridLogger(new Slf4jLogger(log));
>> cfg.setClientMode(false);
>>
>> final TcpDiscoveryKubernetesIpFinder kubernetesPodIpFinder = new
>> TcpDiscoveryKubernetesIpFinder();
>>
>> kubernetesPodIpFinder.setServiceName(SystemDataProvider.getServiceNameEnv);
>> final TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
>>
>>
>> tcpDiscoverySpi.setIpFinder(kubernetesPodIpFinder);
>> tcpDiscoverySpi.setLocalPort(47500); //using a static port,
>> to decrease potential failure causes
>> cfg.setFailureDetectionTimeout(90000);
>> cfg.setDiscoverySpi(tcpDiscoverySpi);
>> return cfg;
>> }
>>
>>
>>
>> The initial node will start up properly every time.
>>
>> In most cases, the ~ 3rd node trying to connect will fail and gets
>> restarted
>> by Kubernetes after some time. Sometimes this node will succeed in
>> connecting to the cluster after a few restarts, but the common case is
>> that
>> the nodes will keep restarting forever.
>>
>> But the major issue is that when a new node fails to connect to the
>> cluster,
>> the cluster seems to become unstable: the number of nodes increases for a
>> very short time, then drops to the previous count or even lower.
>> I am not sure if those are the new connecting nodes loosing the
>> connection
>> immediately again, or if the previous successfully connected nodes loose
>> connection.
>>
>>
>> I also deployed the bare Ignite Docker Image including a configuration
>> for
>> the
>> /TcpDiscoveryKubernetesIpFinder/ as described here
>> https://apacheignite.readme.io/docs/kubernetes-deployment
>> <https://apacheignite.readme.io/docs/kubernetes-deployment> .
>> Even with this minimal setup, I've experienced the same behavior.
>>
>> There is no load on the Ignite Nodes and the network usage is very low.
>>
>> Using another Kubernetes instance on another infrastructure showed the
>> same
>> results, hence I assume this to be an Ignite related issue.
>>
>> What I also tried is, increasing the specific time-outs like
>> /ackTimeout/,
>> /sockTimeout/ etc.
>>
>> Also using the /TcpDiscoveryVmIpFinder/ did not help. Where I got all the
>> endpoints via DNS.
>> Same behavior as described inb4.
>>
>> Please find attached a log file providing information on WARN level.
>> Please
>> let me know if DEBUG level is desired.
>>
>>
>>
>> Kind regards and thanks in advance,
>> keinproblem
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-ignite-users.70518.x6.nabble.com/Volatile-Kubernetes-Node-Discovery-tp12357.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Hi Denis,
the whole cluster is running in Kubernetes.
So basically I just have connections between my pods.
Kind regards,
keinproblem
--
View this message in context:
http://apache-ignite-users.70518.x6.nabble.com/Volatile-Kubernetes-Node-Discovery-tp12357p12373.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.