Hello Ignite team, I a writing data from Spark Dataframe to Ignite, frequently one node goes down, I dont see any error in log file below is the trace. If i restart it doesn't join Cluster unless I stop the Spark job which is writing data to Ignite Cluster.
I have 4 nodes with 4CPU/16GB RAM 200GB disc space, persistenc eis enabled, What could be the reason? [00:44:33] __________ ________________ [00:44:33] / _/ ___/ |/ / _/_ __/ __/ [00:44:33] _/ // (7 7 // / / / / _/ [00:44:33] /___/\___/_/|_/___/ /_/ /___/ [00:44:33] [00:44:33] ver. 2.6.0#20180710-sha1:669feacc [00:44:33] 2018 Copyright(C) Apache Software Foundation [00:44:33] [00:44:33] Ignite documentation: http://ignite.apache.org [00:44:33] [00:44:33] Quiet mode. [00:44:33] ^-- Logging to file '/data/ignitedata/apache-ignite-fabric-2.6.0-bin/work/log/ignite-d90d68c6.0.log' [00:44:33] ^-- Logging by 'JavaLogger [quiet=true, config=null]' [00:44:33] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} [00:44:33] [00:44:33] OS: Linux 3.10.0-862.3.2.el7.x86_64 amd64 [00:44:33] VM information: Java(TM) SE Runtime Environment 1.8.0_171-b11 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11 [00:44:33] Configured plugins: [00:44:33] ^-- None [00:44:33] [00:44:33] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0]] [00:44:33] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides. [00:44:33] Security status [authentication=off, tls/ssl=off] [00:44:35] Nodes started on local machine require more than 20% of physical RAM what can lead to significant slowdown due to swapping (please decrease JVM heap size, data region size or checkpoint buffer size) [required=13412MB, available=15885MB] [00:44:35] Performance suggestions for grid (fix if possible) [00:44:35] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true [00:44:35] ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options) [00:44:35] ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options) [00:44:35] ^-- Speed up flushing of dirty pages by OS (alter vm.dirty_expire_centisecs parameter by setting to 500) [00:44:35] ^-- Reduce pages swapping ratio (set vm.swappiness=10) [00:44:35] Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning [00:44:35] [00:44:35] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat} [00:44:35] [00:44:35] Ignite node started OK (id=d90d68c6) [00:44:35] >>> Ignite cluster is not active (limited functionality available). Use control.(sh|bat) script or IgniteCluster interface to activate. [00:44:35] Topology snapshot [ver=4, servers=4, clients=0, CPUs=16, offheap=40.0GB, heap=4.0GB] [00:44:35] ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F, clusterState=INACTIVE] [00:44:35] ^-- Baseline [id=0, size=4, online=3, offline=1] [00:44:35] ^-- 1 nodes left for auto-activation [a99529d8-e483-44b3-96eb-a5a773e380e3] [00:44:35] Data Regions Configured: [00:44:35] ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB, persistenceEnabled=true] [00:48:20] Topology snapshot [ver=5, servers=4, clients=1, CPUs=16, offheap=50.0GB, heap=8.4GB] [00:48:20] ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F, clusterState=ACTIVE] [00:48:20] ^-- Baseline [id=0, size=4, online=3, offline=1] [00:48:20] Data Regions Configured: [00:48:20] ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB, persistenceEnabled=true] [00:48:37] Topology snapshot [ver=6, servers=4, clients=2, CPUs=16, offheap=60.0GB, heap=12.0GB] [00:48:37] ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F, clusterState=ACTIVE] [00:48:37] ^-- Baseline [id=0, size=4, online=3, offline=1] [00:48:37] Data Regions Configured: [00:48:37] ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB, persistenceEnabled=true] [00:48:37] Topology snapshot [ver=7, servers=4, clients=3, CPUs=16, offheap=70.0GB, heap=16.0GB] [00:48:37] ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F, clusterState=ACTIVE] [00:48:37] ^-- Baseline [id=0, size=4, online=3, offline=1] [00:48:37] Data Regions Configured: [00:48:37] ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB, persistenceEnabled=true] [00:48:38] Topology snapshot [ver=8, servers=4, clients=4, CPUs=16, offheap=80.0GB, heap=19.0GB] [00:48:38] ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F, clusterState=ACTIVE] [00:48:38] ^-- Baseline [id=0, size=4, online=3, offline=1] [00:48:38] Data Regions Configured: [00:48:38] ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB, persistenceEnabled=true] [00:48:40] Topology snapshot [ver=9, servers=4, clients=5, CPUs=16, offheap=90.0GB, heap=23.0GB] [00:48:40] ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F, clusterState=ACTIVE] [00:48:40] ^-- Baseline [id=0, size=4, online=3, offline=1] [00:48:40] Data Regions Configured: [00:48:40] ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB, persistenceEnabled=true] [00:48:40] Topology snapshot [ver=10, servers=4, clients=6, CPUs=16, offheap=100.0GB, heap=26.0GB] [00:48:40] ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F, clusterState=ACTIVE] [00:48:40] ^-- Baseline [id=0, size=4, online=3, offline=1] [00:48:40] Data Regions Configured: [00:48:40] ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB, persistenceEnabled=true] bin/ignite.sh: line 183: 6035 Killed "$JAVA" ${JVM_OPTS} ${QUIET} "${RESTART_SUCCESS_OPT}" ${JMX_MON} -DIGNITE_HOME="${IGNITE_HOME}" -DIGNITE_PROG_NAME="$0" ${JVM_XOPTS} -cp "${CP}" ${MAIN_CLASS} "${CONFIG}" # Thanks -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/