Hello Ignite team,

I a writing data from Spark Dataframe to Ignite, frequently one node goes
down, I dont see any error in log file below is the trace. If i restart it
doesn't join Cluster unless I stop the Spark job which is writing data to
Ignite Cluster.

I have 4 nodes with 4CPU/16GB RAM 200GB disc space, persistenc eis enabled,
What could be the reason?

[00:44:33]    __________  ________________
[00:44:33]   /  _/ ___/ |/ /  _/_  __/ __/
[00:44:33]  _/ // (7 7    // /  / / / _/
[00:44:33] /___/\___/_/|_/___/ /_/ /___/
[00:44:33]
[00:44:33] ver. 2.6.0#20180710-sha1:669feacc
[00:44:33] 2018 Copyright(C) Apache Software Foundation
[00:44:33]
[00:44:33] Ignite documentation: http://ignite.apache.org
[00:44:33]
[00:44:33] Quiet mode.
[00:44:33]   ^-- Logging to file
'/data/ignitedata/apache-ignite-fabric-2.6.0-bin/work/log/ignite-d90d68c6.0.log'
[00:44:33]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[00:44:33]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false
or "-v" to ignite.{sh|bat}
[00:44:33]
[00:44:33] OS: Linux 3.10.0-862.3.2.el7.x86_64 amd64
[00:44:33] VM information: Java(TM) SE Runtime Environment 1.8.0_171-b11
Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11
[00:44:33] Configured plugins:
[00:44:33]   ^-- None
[00:44:33]
[00:44:33] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler
[tryStop=false, timeout=0]]
[00:44:33] Message queue limit is set to 0 which may lead to potential OOMEs
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to
message queues growth on sender and receiver sides.
[00:44:33] Security status [authentication=off, tls/ssl=off]
[00:44:35] Nodes started on local machine require more than 20% of physical
RAM what can lead to significant slowdown due to swapping (please decrease
JVM heap size, data region size or checkpoint buffer size)
[required=13412MB, available=15885MB]
[00:44:35] Performance suggestions for grid  (fix if possible)
[00:44:35] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[00:44:35]   ^-- Set max direct memory size if getting 'OOME: Direct buffer
memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[00:44:35]   ^-- Disable processing of calls to System.gc() (add
'-XX:+DisableExplicitGC' to JVM options)
[00:44:35]   ^-- Speed up flushing of dirty pages by OS (alter
vm.dirty_expire_centisecs parameter by setting to 500)
[00:44:35]   ^-- Reduce pages swapping ratio (set vm.swappiness=10)
[00:44:35] Refer to this page for more performance suggestions:
https://apacheignite.readme.io/docs/jvm-and-system-tuning
[00:44:35]
[00:44:35] To start Console Management & Monitoring run
ignitevisorcmd.{sh|bat}
[00:44:35]
[00:44:35] Ignite node started OK (id=d90d68c6)
[00:44:35] >>> Ignite cluster is not active (limited functionality
available). Use control.(sh|bat) script or IgniteCluster interface to
activate.
[00:44:35] Topology snapshot [ver=4, servers=4, clients=0, CPUs=16,
offheap=40.0GB, heap=4.0GB]
[00:44:35]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=INACTIVE]
[00:44:35]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:44:35]   ^-- 1 nodes left for auto-activation
[a99529d8-e483-44b3-96eb-a5a773e380e3]
[00:44:35] Data Regions Configured:
[00:44:35]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:20] Topology snapshot [ver=5, servers=4, clients=1, CPUs=16,
offheap=50.0GB, heap=8.4GB]
[00:48:20]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:20]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:20] Data Regions Configured:
[00:48:20]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:37] Topology snapshot [ver=6, servers=4, clients=2, CPUs=16,
offheap=60.0GB, heap=12.0GB]
[00:48:37]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:37]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:37] Data Regions Configured:
[00:48:37]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:37] Topology snapshot [ver=7, servers=4, clients=3, CPUs=16,
offheap=70.0GB, heap=16.0GB]
[00:48:37]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:37]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:37] Data Regions Configured:
[00:48:37]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:38] Topology snapshot [ver=8, servers=4, clients=4, CPUs=16,
offheap=80.0GB, heap=19.0GB]
[00:48:38]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:38]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:38] Data Regions Configured:
[00:48:38]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:40] Topology snapshot [ver=9, servers=4, clients=5, CPUs=16,
offheap=90.0GB, heap=23.0GB]
[00:48:40]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:40]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:40] Data Regions Configured:
[00:48:40]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:40] Topology snapshot [ver=10, servers=4, clients=6, CPUs=16,
offheap=100.0GB, heap=26.0GB]
[00:48:40]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:40]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:40] Data Regions Configured:
[00:48:40]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
bin/ignite.sh: line 183:  6035 Killed                  "$JAVA" ${JVM_OPTS}
${QUIET} "${RESTART_SUCCESS_OPT}" ${JMX_MON} -DIGNITE_HOME="${IGNITE_HOME}"
-DIGNITE_PROG_NAME="$0" ${JVM_XOPTS} -cp "${CP}" ${MAIN_CLASS} "${CONFIG}"
#

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to