---------- Forwarded message --------- From: Alena Laas <alena.l...@cbsinteractive.com> Date: Thu, Jan 10, 2019 at 5:13 PM Subject: Ignite in Kubernetes not works correctly To: <user@ignite.apache.org> Cc: Vadim Shcherbakov <vadim.shcherba...@cbsinteractive.com>
Hello! Could you please help with some problem with Ignite within Kubernetes cluster? When we start 2 Ignite nodes at the same time or use scaling for Deployment (from 1 to 2) everything is fine, both of them are visible inside Ignite cluster (we use web console to see it) But after we kill pod with one node and it restarts the node is no more seen in Ignite cluster. Moreover the logs from this restarted node look poor: [13:32:57] __________ ________________ [13:32:57] / _/ ___/ |/ / _/_ __/ __/ [13:32:57] _/ // (7 7 // / / / / _/ [13:32:57] /___/\___/_/|_/___/ /_/ /___/ [13:32:57] [13:32:57] ver. 2.7.0#20181130-sha1:256ae401 [13:32:57] 2018 Copyright(C) Apache Software Foundation [13:32:57] [13:32:57] Ignite documentation: http://ignite.apache.org [13:32:57] [13:32:57] Quiet mode. [13:32:57] ^-- Logging to file '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log' [13:32:57] ^-- Logging by 'JavaLogger [quiet=true, config=null]' [13:32:57] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} [13:32:57] [13:32:57] OS: Linux 4.15.0-1036-azure amd64 [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13 Oracle Corporation OpenJDK 64-Bit Server VM 25.181-b13 [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments. [13:32:57] Configured plugins: [13:32:57] ^-- None [13:32:57] [13:32:57] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]]] [13:32:58] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides. [13:32:58] Security status [authentication=off, tls/ssl=off] And logs from the remaining node say that there are either 2 or 1 server and this info is blinking [14:02:05] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:02:20] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:02:35] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:02:50] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:03:00] Topology snapshot [ver=240, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:03:00] Topology snapshot [ver=241, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:03:06] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:03:16] Topology snapshot [ver=242, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:03:16] Topology snapshot [ver=243, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:03:21] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:03:31] Topology snapshot [ver=244, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:03:31] Topology snapshot [ver=245, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:03:36] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:03:46] Topology snapshot [ver=246, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:03:46] Topology snapshot [ver=247, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:03:51] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] [14:04:01] Topology snapshot [ver=248, locNode=a5eb30e1, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB] [14:04:01] Topology snapshot [ver=249, locNode=a5eb30e1, servers=1, clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB] [14:04:06] Joining node doesn't have encryption data [node=7d323675-bc0b-4507-affb-672b25766201] I am attaching our config file for Ignite server and yaml files for Kubernetes. Everything there was done according to your official documentation. Ignite version we are trying now is 2.7.0 Looking forward to getting an answer from you. -- *ALENA LAAS*SOFTWARE ENGINEER (JAVA) CNET Content Solutions OFFICE +7.495.967.1201 FAX +7.495.967.1203 5 Letnikovskaya str., Moscow, Russia, 115114 [image: CNET Content Solutions] -- *ALENA LAAS*SOFTWARE ENGINEER (JAVA) CNET Content Solutions OFFICE +7.495.967.1201 FAX +7.495.967.1203 5 Letnikovskaya str., Moscow, Russia, 115114 [image: CNET Content Solutions]
<?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <!-- Ignite Spring configuration file to startup Ignite cache. This file demonstrates how to configure cache using Spring. Provided cache will be created on node startup. Use this configuration file when running HTTP REST examples (see 'examples/rest' folder). When starting a standalone node, you need to execute the following command: {IGNITE_HOME}/bin/ignite.{bat|sh} examples/config/example-cache.xml When starting Ignite from Java IDE, pass path to this file to Ignition: Ignition.start("examples/config/example-cache.xml"); --> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="igniteInstanceName" value="ignite-grid"/> <!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. --> <property name="discoverySpi"> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> <property name="ipFinder"> <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder"> <property name="namespace" value="fcat-stage"/> <property name="serviceName" value="ignite-cache-service"/> </bean> </property> </bean> </property> <!-- Durable memory configuration. --> <property name="dataStorageConfiguration"> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> <property name="defaultDataRegionConfiguration"> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> <!-- Setting the size of the default region to 20GB. --> <property name="maxSize" value="#{20L * 1024 * 1024 * 1024}"/> </bean> </property> </bean> </property> </bean> </beans>
fcat-ignite-stage.yaml
Description: application/yaml