Re: Ignite in Kubernetes not works correctly

2019-01-14 Thread Alena Laas
failureDetectionTimeout - 6
joinTimeout - 12
Saw these recomendations in one of the answers in your forum

On Mon, Jan 14, 2019 at 2:21 PM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> Glad you managed to resolve it. What did you have to increase the values
> to?
>
> Regards,
> Stephen
>
> On 14 Jan 2019, at 09:34, Alena Laas 
> wrote:
>
> It seems that increasing joinTimeout and failureDetectionTimeout solved
> the problem.
>
> On Fri, Jan 11, 2019 at 5:24 PM Alena Laas 
> wrote:
>
>> I attached part of the log with "node failed" events (100.99.129.141 - ip
>> of restarted node)
>>
>> These events are repeated until suddenly after about 40 min - an hour
>> node is connected to cluster.
>>
>> Could you explain why this is happening?
>>
>> On Thu, Jan 10, 2019 at 7:54 PM Alena Laas 
>> wrote:
>>
>>> We are using Azure AKS cluster.
>>>
>>> We kill pod using Kubernetes dashboard or through kubectl (kubectl
>>> delete pods ), never mind, result is the same.
>>>
>>> Maybe you need some more logs from us?
>>>
>>> On Thu, Jan 10, 2019 at 7:28 PM Stephen Darlington <
>>> stephen.darling...@gridgain.com> wrote:
>>>
 What kind of environment are you using? A public cloud? Your own data
 centre? And how are you killing the pod?

 I fired up a cluster using Minikube and your configuration and it
 worked as far as I could see. (I deleted the pod using the dashboard, for
 what that’s worth.)

 Regards,
 Stephen

 On 10 Jan 2019, at 14:20, Alena Laas 
 wrote:



 -- Forwarded message -
 From: Alena Laas 
 Date: Thu, Jan 10, 2019 at 5:13 PM
 Subject: Ignite in Kubernetes not works correctly
 To: 
 Cc: Vadim Shcherbakov 


 Hello!
 Could you please help with some problem with Ignite within Kubernetes
 cluster?

 When we start 2 Ignite nodes at the same time or use scaling for
 Deployment (from 1 to 2) everything is fine, both of them are visible
 inside Ignite cluster (we use web console to see it)

 But after we kill pod with one node and it restarts the node is no more
 seen in Ignite cluster. Moreover the logs from this restarted node look
 poor:
 [13:32:57] __ 
 [13:32:57] / _/ ___/ |/ / _/_ __/ __/
 [13:32:57] _/ // (7 7 // / / / / _/
 [13:32:57] /___/\___/_/|_/___/ /_/ /___/
 [13:32:57]
 [13:32:57] ver. 2.7.0#20181130-sha1:256ae401
 [13:32:57] 2018 Copyright(C) Apache Software Foundation
 [13:32:57]
 [13:32:57] Ignite documentation: http://ignite.apache.org
 [13:32:57]
 [13:32:57] Quiet mode.
 [13:32:57] ^-- Logging to file
 '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
 [13:32:57] ^-- Logging by 'JavaLogger [quiet=true, config=null]'
 [13:32:57] ^-- To see **FULL** console log here add
 -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
 [13:32:57]
 [13:32:57] OS: Linux 4.15.0-1036-azure amd64
 [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13
 Oracle Corporation OpenJDK 64-Bit Server VM 25.181-b13
 [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true'
 to avoid possible problems in mixed environments.
 [13:32:57] Configured plugins:
 [13:32:57] ^-- None
 [13:32:57]
 [13:32:57] Configured failure handler:
 [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
 super=AbstractFailureHandler 
 [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED
 [13:32:58] Message queue limit is set to 0 which may lead to potential
 OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due
 to message queues growth on sender and receiver sides.
 [13:32:58] Security status [authentication=off, tls/ssl=off]

 And logs from the remaining node say that there are either 2 or 1
 server and this info is blinking
 [14:02:05] Joining node doesn't have encryption data
 [node=7d323675-bc0b-4507-affb-672b25766201]
 [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2,
 clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
 [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1,
 clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
 [14:02:20] Joining node doesn't have encryption data
 [node=7d323675-bc0b-4507-affb-672b25766201]
 [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2,
 clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
 [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1,
 clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
 [14:02:35] Joining node doesn't have encryption data
 [node=7d323675-bc0b-4507-affb-672b25766201]
 [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2,
 clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
 [14:02:45] Topology 

Re: Ignite in Kubernetes not works correctly

2019-01-14 Thread Stephen Darlington
Glad you managed to resolve it. What did you have to increase the values to?

Regards,
Stephen

> On 14 Jan 2019, at 09:34, Alena Laas  wrote:
> 
> It seems that increasing joinTimeout and failureDetectionTimeout solved the 
> problem.
> 
> On Fri, Jan 11, 2019 at 5:24 PM Alena Laas  > wrote:
> I attached part of the log with "node failed" events (100.99.129.141 - ip of 
> restarted node)
> 
> These events are repeated until suddenly after about 40 min - an hour node is 
> connected to cluster.
> 
> Could you explain why this is happening?
> 
> On Thu, Jan 10, 2019 at 7:54 PM Alena Laas  > wrote:
> We are using Azure AKS cluster.
> 
> We kill pod using Kubernetes dashboard or through kubectl (kubectl delete 
> pods ), never mind, result is the same.
> 
> Maybe you need some more logs from us?
> 
> On Thu, Jan 10, 2019 at 7:28 PM Stephen Darlington 
> mailto:stephen.darling...@gridgain.com>> 
> wrote:
> What kind of environment are you using? A public cloud? Your own data centre? 
> And how are you killing the pod?
> 
> I fired up a cluster using Minikube and your configuration and it worked as 
> far as I could see. (I deleted the pod using the dashboard, for what that’s 
> worth.)
> 
> Regards,
> Stephen
> 
>> On 10 Jan 2019, at 14:20, Alena Laas > > wrote:
>> 
>> 
>> 
>> -- Forwarded message -
>> From: Alena Laas > >
>> Date: Thu, Jan 10, 2019 at 5:13 PM
>> Subject: Ignite in Kubernetes not works correctly
>> To: mailto:user@ignite.apache.org>>
>> Cc: Vadim Shcherbakov > >
>> 
>> 
>> Hello!
>> Could you please help with some problem with Ignite within Kubernetes 
>> cluster?
>> 
>> When we start 2 Ignite nodes at the same time or use scaling for Deployment 
>> (from 1 to 2) everything is fine, both of them are visible inside Ignite 
>> cluster (we use web console to see it)
>> 
>> But after we kill pod with one node and it restarts the node is no more seen 
>> in Ignite cluster. Moreover the logs from this restarted node look poor:
>> [13:32:57]__   
>> [13:32:57]   /  _/ ___/ |/ /  _/_  __/ __/ 
>> [13:32:57]  _/ // (7 7// /  / / / _/   
>> [13:32:57] /___/\___/_/|_/___/ /_/ /___/  
>> [13:32:57] 
>> [13:32:57] ver. 2.7.0#20181130-sha1:256ae401
>> [13:32:57] 2018 Copyright(C) Apache Software Foundation
>> [13:32:57] 
>> [13:32:57] Ignite documentation: http://ignite.apache.org 
>> 
>> [13:32:57] 
>> [13:32:57] Quiet mode.
>> [13:32:57]   ^-- Logging to file 
>> '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
>> [13:32:57]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
>> [13:32:57]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false 
>> or "-v" to ignite.{sh|bat}
>> [13:32:57] 
>> [13:32:57] OS: Linux 4.15.0-1036-azure amd64
>> [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13 Oracle 
>> Corporation OpenJDK 64-Bit Server VM 25.181-b13
>> [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true' to 
>> avoid possible problems in mixed environments.
>> [13:32:57] Configured plugins:
>> [13:32:57]   ^-- None
>> [13:32:57] 
>> [13:32:57] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler 
>> [tryStop=false, timeout=0, super=AbstractFailureHandler 
>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED
>> [13:32:58] Message queue limit is set to 0 which may lead to potential OOMEs 
>> when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to 
>> message queues growth on sender and receiver sides.
>> [13:32:58] Security status [authentication=off, tls/ssl=off]
>> 
>> And logs from the remaining node say that there are either 2 or 1 server and 
>> this info is blinking
>> [14:02:05] Joining node doesn't have encryption data 
>> [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2, 
>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1, 
>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:02:20] Joining node doesn't have encryption data 
>> [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2, 
>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1, 
>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>> [14:02:35] Joining node doesn't have encryption data 
>> [node=7d323675-bc0b-4507-affb-672b25766201]
>> [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2, 
>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>> [14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1, 
>> clients=0, state=ACTIVE, CPUs=8, 

Re: Ignite in Kubernetes not works correctly

2019-01-14 Thread Alena Laas
It seems that increasing joinTimeout and failureDetectionTimeout solved the
problem.

On Fri, Jan 11, 2019 at 5:24 PM Alena Laas 
wrote:

> I attached part of the log with "node failed" events (100.99.129.141 - ip
> of restarted node)
>
> These events are repeated until suddenly after about 40 min - an hour node
> is connected to cluster.
>
> Could you explain why this is happening?
>
> On Thu, Jan 10, 2019 at 7:54 PM Alena Laas 
> wrote:
>
>> We are using Azure AKS cluster.
>>
>> We kill pod using Kubernetes dashboard or through kubectl (kubectl delete
>> pods ), never mind, result is the same.
>>
>> Maybe you need some more logs from us?
>>
>> On Thu, Jan 10, 2019 at 7:28 PM Stephen Darlington <
>> stephen.darling...@gridgain.com> wrote:
>>
>>> What kind of environment are you using? A public cloud? Your own data
>>> centre? And how are you killing the pod?
>>>
>>> I fired up a cluster using Minikube and your configuration and it worked
>>> as far as I could see. (I deleted the pod using the dashboard, for what
>>> that’s worth.)
>>>
>>> Regards,
>>> Stephen
>>>
>>> On 10 Jan 2019, at 14:20, Alena Laas 
>>> wrote:
>>>
>>>
>>>
>>> -- Forwarded message -
>>> From: Alena Laas 
>>> Date: Thu, Jan 10, 2019 at 5:13 PM
>>> Subject: Ignite in Kubernetes not works correctly
>>> To: 
>>> Cc: Vadim Shcherbakov 
>>>
>>>
>>> Hello!
>>> Could you please help with some problem with Ignite within Kubernetes
>>> cluster?
>>>
>>> When we start 2 Ignite nodes at the same time or use scaling for
>>> Deployment (from 1 to 2) everything is fine, both of them are visible
>>> inside Ignite cluster (we use web console to see it)
>>>
>>> But after we kill pod with one node and it restarts the node is no more
>>> seen in Ignite cluster. Moreover the logs from this restarted node look
>>> poor:
>>> [13:32:57] __ 
>>> [13:32:57] / _/ ___/ |/ / _/_ __/ __/
>>> [13:32:57] _/ // (7 7 // / / / / _/
>>> [13:32:57] /___/\___/_/|_/___/ /_/ /___/
>>> [13:32:57]
>>> [13:32:57] ver. 2.7.0#20181130-sha1:256ae401
>>> [13:32:57] 2018 Copyright(C) Apache Software Foundation
>>> [13:32:57]
>>> [13:32:57] Ignite documentation: http://ignite.apache.org
>>> [13:32:57]
>>> [13:32:57] Quiet mode.
>>> [13:32:57] ^-- Logging to file
>>> '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
>>> [13:32:57] ^-- Logging by 'JavaLogger [quiet=true, config=null]'
>>> [13:32:57] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false
>>> or "-v" to ignite.{sh|bat}
>>> [13:32:57]
>>> [13:32:57] OS: Linux 4.15.0-1036-azure amd64
>>> [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13
>>> Oracle Corporation OpenJDK 64-Bit Server VM 25.181-b13
>>> [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true'
>>> to avoid possible problems in mixed environments.
>>> [13:32:57] Configured plugins:
>>> [13:32:57] ^-- None
>>> [13:32:57]
>>> [13:32:57] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler
>>> [tryStop=false, timeout=0, super=AbstractFailureHandler
>>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED
>>> [13:32:58] Message queue limit is set to 0 which may lead to potential
>>> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due
>>> to message queues growth on sender and receiver sides.
>>> [13:32:58] Security status [authentication=off, tls/ssl=off]
>>>
>>> And logs from the remaining node say that there are either 2 or 1 server
>>> and this info is blinking
>>> [14:02:05] Joining node doesn't have encryption data
>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>> [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2,
>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>> [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1,
>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>> [14:02:20] Joining node doesn't have encryption data
>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>> [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2,
>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>> [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1,
>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>> [14:02:35] Joining node doesn't have encryption data
>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>> [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2,
>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>> [14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1,
>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
>>> [14:02:50] Joining node doesn't have encryption data
>>> [node=7d323675-bc0b-4507-affb-672b25766201]
>>> [14:03:00] Topology snapshot [ver=240, locNode=a5eb30e1, servers=2,
>>> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
>>> [14:03:00] Topology snapshot [ver=241, locNode=a5eb30e1, servers=1,
>>> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, 

Re: Ignite in Kubernetes not works correctly

2019-01-10 Thread Alena Laas
We are using Azure AKS cluster.

We kill pod using Kubernetes dashboard or through kubectl (kubectl delete
pods ), never mind, result is the same.

Maybe you need some more logs from us?

On Thu, Jan 10, 2019 at 7:28 PM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> What kind of environment are you using? A public cloud? Your own data
> centre? And how are you killing the pod?
>
> I fired up a cluster using Minikube and your configuration and it worked
> as far as I could see. (I deleted the pod using the dashboard, for what
> that’s worth.)
>
> Regards,
> Stephen
>
> On 10 Jan 2019, at 14:20, Alena Laas 
> wrote:
>
>
>
> -- Forwarded message -
> From: Alena Laas 
> Date: Thu, Jan 10, 2019 at 5:13 PM
> Subject: Ignite in Kubernetes not works correctly
> To: 
> Cc: Vadim Shcherbakov 
>
>
> Hello!
> Could you please help with some problem with Ignite within Kubernetes
> cluster?
>
> When we start 2 Ignite nodes at the same time or use scaling for
> Deployment (from 1 to 2) everything is fine, both of them are visible
> inside Ignite cluster (we use web console to see it)
>
> But after we kill pod with one node and it restarts the node is no more
> seen in Ignite cluster. Moreover the logs from this restarted node look
> poor:
> [13:32:57] __ 
> [13:32:57] / _/ ___/ |/ / _/_ __/ __/
> [13:32:57] _/ // (7 7 // / / / / _/
> [13:32:57] /___/\___/_/|_/___/ /_/ /___/
> [13:32:57]
> [13:32:57] ver. 2.7.0#20181130-sha1:256ae401
> [13:32:57] 2018 Copyright(C) Apache Software Foundation
> [13:32:57]
> [13:32:57] Ignite documentation: http://ignite.apache.org
> [13:32:57]
> [13:32:57] Quiet mode.
> [13:32:57] ^-- Logging to file
> '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
> [13:32:57] ^-- Logging by 'JavaLogger [quiet=true, config=null]'
> [13:32:57] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false
> or "-v" to ignite.{sh|bat}
> [13:32:57]
> [13:32:57] OS: Linux 4.15.0-1036-azure amd64
> [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13
> Oracle Corporation OpenJDK 64-Bit Server VM 25.181-b13
> [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true' to
> avoid possible problems in mixed environments.
> [13:32:57] Configured plugins:
> [13:32:57] ^-- None
> [13:32:57]
> [13:32:57] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler
> [tryStop=false, timeout=0, super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED
> [13:32:58] Message queue limit is set to 0 which may lead to potential
> OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due
> to message queues growth on sender and receiver sides.
> [13:32:58] Security status [authentication=off, tls/ssl=off]
>
> And logs from the remaining node say that there are either 2 or 1 server
> and this info is blinking
> [14:02:05] Joining node doesn't have encryption data
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2,
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1,
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:02:20] Joining node doesn't have encryption data
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2,
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1,
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:02:35] Joining node doesn't have encryption data
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2,
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1,
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:02:50] Joining node doesn't have encryption data
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:03:00] Topology snapshot [ver=240, locNode=a5eb30e1, servers=2,
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:03:00] Topology snapshot [ver=241, locNode=a5eb30e1, servers=1,
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:03:06] Joining node doesn't have encryption data
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:03:16] Topology snapshot [ver=242, locNode=a5eb30e1, servers=2,
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:03:16] Topology snapshot [ver=243, locNode=a5eb30e1, servers=1,
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:03:21] Joining node doesn't have encryption data
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:03:31] Topology snapshot [ver=244, locNode=a5eb30e1, servers=2,
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:03:31] Topology snapshot [ver=245, 

Re: Ignite in Kubernetes not works correctly

2019-01-10 Thread Stephen Darlington
What kind of environment are you using? A public cloud? Your own data centre? 
And how are you killing the pod?

I fired up a cluster using Minikube and your configuration and it worked as far 
as I could see. (I deleted the pod using the dashboard, for what that’s worth.)

Regards,
Stephen

> On 10 Jan 2019, at 14:20, Alena Laas  wrote:
> 
> 
> 
> -- Forwarded message -
> From: Alena Laas  >
> Date: Thu, Jan 10, 2019 at 5:13 PM
> Subject: Ignite in Kubernetes not works correctly
> To: mailto:user@ignite.apache.org>>
> Cc: Vadim Shcherbakov  >
> 
> 
> Hello!
> Could you please help with some problem with Ignite within Kubernetes cluster?
> 
> When we start 2 Ignite nodes at the same time or use scaling for Deployment 
> (from 1 to 2) everything is fine, both of them are visible inside Ignite 
> cluster (we use web console to see it)
> 
> But after we kill pod with one node and it restarts the node is no more seen 
> in Ignite cluster. Moreover the logs from this restarted node look poor:
> [13:32:57]__   
> [13:32:57]   /  _/ ___/ |/ /  _/_  __/ __/ 
> [13:32:57]  _/ // (7 7// /  / / / _/   
> [13:32:57] /___/\___/_/|_/___/ /_/ /___/  
> [13:32:57] 
> [13:32:57] ver. 2.7.0#20181130-sha1:256ae401
> [13:32:57] 2018 Copyright(C) Apache Software Foundation
> [13:32:57] 
> [13:32:57] Ignite documentation: http://ignite.apache.org 
> 
> [13:32:57] 
> [13:32:57] Quiet mode.
> [13:32:57]   ^-- Logging to file 
> '/opt/ignite/apache-ignite/work/log/ignite-7d323675.0.log'
> [13:32:57]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
> [13:32:57]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or 
> "-v" to ignite.{sh|bat}
> [13:32:57] 
> [13:32:57] OS: Linux 4.15.0-1036-azure amd64
> [13:32:57] VM information: OpenJDK Runtime Environment 1.8.0_181-b13 Oracle 
> Corporation OpenJDK 64-Bit Server VM 25.181-b13
> [13:32:57] Please set system property '-Djava.net.preferIPv4Stack=true' to 
> avoid possible problems in mixed environments.
> [13:32:57] Configured plugins:
> [13:32:57]   ^-- None
> [13:32:57] 
> [13:32:57] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0, super=AbstractFailureHandler 
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED
> [13:32:58] Message queue limit is set to 0 which may lead to potential OOMEs 
> when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to 
> message queues growth on sender and receiver sides.
> [13:32:58] Security status [authentication=off, tls/ssl=off]
> 
> And logs from the remaining node say that there are either 2 or 1 server and 
> this info is blinking
> [14:02:05] Joining node doesn't have encryption data 
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:02:15] Topology snapshot [ver=234, locNode=a5eb30e1, servers=2, 
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:02:15] Topology snapshot [ver=235, locNode=a5eb30e1, servers=1, 
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:02:20] Joining node doesn't have encryption data 
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:02:30] Topology snapshot [ver=236, locNode=a5eb30e1, servers=2, 
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:02:30] Topology snapshot [ver=237, locNode=a5eb30e1, servers=1, 
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:02:35] Joining node doesn't have encryption data 
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:02:45] Topology snapshot [ver=238, locNode=a5eb30e1, servers=2, 
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:02:45] Topology snapshot [ver=239, locNode=a5eb30e1, servers=1, 
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:02:50] Joining node doesn't have encryption data 
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:03:00] Topology snapshot [ver=240, locNode=a5eb30e1, servers=2, 
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:03:00] Topology snapshot [ver=241, locNode=a5eb30e1, servers=1, 
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:03:06] Joining node doesn't have encryption data 
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:03:16] Topology snapshot [ver=242, locNode=a5eb30e1, servers=2, 
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:03:16] Topology snapshot [ver=243, locNode=a5eb30e1, servers=1, 
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB, heap=1.0GB]
> [14:03:21] Joining node doesn't have encryption data 
> [node=7d323675-bc0b-4507-affb-672b25766201]
> [14:03:31] Topology snapshot [ver=244, locNode=a5eb30e1, servers=2, 
> clients=0, state=ACTIVE, CPUs=16, offheap=40.0GB, heap=2.0GB]
> [14:03:31] Topology snapshot [ver=245, locNode=a5eb30e1, servers=1, 
> clients=0, state=ACTIVE, CPUs=8, offheap=20.0GB,