Re: Spark to Ignite Data load, Ignite node crashashing

2018-08-13 Thread dkarachentsev
Hi,

Looks like it was killed by kernel. Check logs for OOM Killer:
grep -i 'killed process' /var/log/messages

If process was killed by Linux, correct your config, you might be set too
much memory for Ignite paged memory, set to lower values [1]

If not, try to find in logs by PID, maybe it was killed due to other reason.

[1] https://apacheignite.readme.io/docs/memory-configuration

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Spark to Ignite Data load, Ignite node crashashing

2018-08-09 Thread ApacheUser
attaching log of the tow nodes crashing everytime, I have 4 nodes but the
other two nodes ver rarely crashed. All nodes(VM) are 4CPU/16GB RAm/200GB
HDD(Shared Storage)

node 3:
[16:35:21,938][INFO][main][IgniteKernal] 

>>>__    
>>>   /  _/ ___/ |/ /  _/_  __/ __/  
>>>  _/ // (7 7// /  / / / _/
>>> /___/\___/_/|_/___/ /_/ /___/   
>>> 
>>> ver. 2.6.0#20180710-sha1:669feacc
>>> 2018 Copyright(C) Apache Software Foundation
>>> 
>>> Ignite documentation: http://ignite.apache.org

[16:35:21,946][INFO][main][IgniteKernal] Config URL:
file:/data/ignitedata/apache-ignite-fabric-2.6.0-bin/config/default-config.xml
[16:35:21,954][INFO][main][IgniteKernal] IgniteConfiguration
[igniteInstanceName=null, pubPoolSize=8, svcPoolSize=8, callbackPoolSize=8,
stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=4,
dataStreamerPoolSize=8, utilityCachePoolSize=8,
utilityCacheKeepAliveTime=6, p2pPoolSize=2, qryPoolSize=8,
igniteHome=/data/ignitedata/apache-ignite-fabric-2.6.0-bin,
igniteWorkDir=/data/ignitedata/apache-ignite-fabric-2.6.0-bin/work,
mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e,
nodeId=df202ccb-356f-426a-8131-e2cc0b9bf98f,
marsh=org.apache.ignite.internal.binary.BinaryMarshaller@3023df74,
marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000,
sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1,
metricsUpdateFreq=2000, metricsExpTime=9223372036854775807,
discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0,
marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=60,
forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null],
segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true,
allResolversPassReq=true, segChkFreq=1, commSpi=TcpCommunicationSpi
[connectGate=null, connPlc=null, enableForcibleNodeKill=false,
enableTroubleshootingLog=false,
srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@6302bbb1,
locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1,
directBuf=true, directSndBuf=false, idleConnTimeout=60,
connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768,
sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=1000, nioSrvr=null,
shmemSrv=null, usePairedConnections=false, connectionsPerNode=1,
tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32,
unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1,
boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null,
ctxInitLatch=java.util.concurrent.CountDownLatch@31304f14[Count = 1],
stopping=false,
metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@34a3d150],
evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@2a4fb17b,
colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null],
indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@7cc0cdad,
addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1,
txCfg=org.apache.ignite.configuration.TransactionConfiguration@7c7b252e,
cacheSanityCheckEnabled=true, discoStartupDelay=6, deployMode=SHARED,
p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100,
timeSrvPortRange=100, failureDetectionTimeout=1,
clientFailureDetectionTimeout=3, metricsLogFreq=6, hadoopCfg=null,
connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@4d5d943d,
odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration
[seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null,
grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null,
binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration
[sysRegionInitSize=41943040, sysCacheMaxSize=104857600, pageSize=0,
concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=default,
maxSize=10737418240, initSize=268435456, swapPath=null,
pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100,
metricsEnabled=true, metricsSubIntervalCount=5,
metricsRateTimeInterval=6, persistenceEnabled=true,
checkpointPageBufSize=0], storagePath=/data/ignitedata/data,
checkpointFreq=18, lockWaitTime=1, checkpointThreads=4,
checkpointWriteOrder=SEQUENTIAL, walHistSize=20, walSegments=10,
walSegmentSize=67108864, walPath=/root/ignite/wal,
walArchivePath=db/wal/archive, metricsEnabled=true, walMode=LOG_ONLY,
walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000,
walRecordIterBuffSize=67108864, alwaysWriteFullPages=false,
fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@4c583ecf,
metricsSubIntervalCnt=5, metricsRateTimeInterval=6,
walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false,
walCompactionEnabled=false], activeOnStart=true, autoActivation=true,
longQryWarnTimeout=500, sqlConnCfg=null,
cliConnCfg=ClientConnectorConfiguration [host=null, port=10800,
portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true,
maxOpenCursorsPerConn=128, threadPoolSize=8, 

Spark to Ignite Data load, Ignite node crashashing

2018-08-08 Thread ApacheUser
Hello Ignite team,

I a writing data from Spark Dataframe to Ignite, frequently one node goes
down, I dont see any error in log file below is the trace. If i restart it
doesn't join Cluster unless I stop the Spark job which is writing data to
Ignite Cluster.

I have 4 nodes with 4CPU/16GB RAM 200GB disc space, persistenc eis enabled,
What could be the reason?

[00:44:33]__  
[00:44:33]   /  _/ ___/ |/ /  _/_  __/ __/
[00:44:33]  _/ // (7 7// /  / / / _/
[00:44:33] /___/\___/_/|_/___/ /_/ /___/
[00:44:33]
[00:44:33] ver. 2.6.0#20180710-sha1:669feacc
[00:44:33] 2018 Copyright(C) Apache Software Foundation
[00:44:33]
[00:44:33] Ignite documentation: http://ignite.apache.org
[00:44:33]
[00:44:33] Quiet mode.
[00:44:33]   ^-- Logging to file
'/data/ignitedata/apache-ignite-fabric-2.6.0-bin/work/log/ignite-d90d68c6.0.log'
[00:44:33]   ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[00:44:33]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false
or "-v" to ignite.{sh|bat}
[00:44:33]
[00:44:33] OS: Linux 3.10.0-862.3.2.el7.x86_64 amd64
[00:44:33] VM information: Java(TM) SE Runtime Environment 1.8.0_171-b11
Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11
[00:44:33] Configured plugins:
[00:44:33]   ^-- None
[00:44:33]
[00:44:33] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler
[tryStop=false, timeout=0]]
[00:44:33] Message queue limit is set to 0 which may lead to potential OOMEs
when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to
message queues growth on sender and receiver sides.
[00:44:33] Security status [authentication=off, tls/ssl=off]
[00:44:35] Nodes started on local machine require more than 20% of physical
RAM what can lead to significant slowdown due to swapping (please decrease
JVM heap size, data region size or checkpoint buffer size)
[required=13412MB, available=15885MB]
[00:44:35] Performance suggestions for grid  (fix if possible)
[00:44:35] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[00:44:35]   ^-- Set max direct memory size if getting 'OOME: Direct buffer
memory' (add '-XX:MaxDirectMemorySize=[g|G|m|M|k|K]' to JVM options)
[00:44:35]   ^-- Disable processing of calls to System.gc() (add
'-XX:+DisableExplicitGC' to JVM options)
[00:44:35]   ^-- Speed up flushing of dirty pages by OS (alter
vm.dirty_expire_centisecs parameter by setting to 500)
[00:44:35]   ^-- Reduce pages swapping ratio (set vm.swappiness=10)
[00:44:35] Refer to this page for more performance suggestions:
https://apacheignite.readme.io/docs/jvm-and-system-tuning
[00:44:35]
[00:44:35] To start Console Management & Monitoring run
ignitevisorcmd.{sh|bat}
[00:44:35]
[00:44:35] Ignite node started OK (id=d90d68c6)
[00:44:35] >>> Ignite cluster is not active (limited functionality
available). Use control.(sh|bat) script or IgniteCluster interface to
activate.
[00:44:35] Topology snapshot [ver=4, servers=4, clients=0, CPUs=16,
offheap=40.0GB, heap=4.0GB]
[00:44:35]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=INACTIVE]
[00:44:35]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:44:35]   ^-- 1 nodes left for auto-activation
[a99529d8-e483-44b3-96eb-a5a773e380e3]
[00:44:35] Data Regions Configured:
[00:44:35]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:20] Topology snapshot [ver=5, servers=4, clients=1, CPUs=16,
offheap=50.0GB, heap=8.4GB]
[00:48:20]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:20]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:20] Data Regions Configured:
[00:48:20]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:37] Topology snapshot [ver=6, servers=4, clients=2, CPUs=16,
offheap=60.0GB, heap=12.0GB]
[00:48:37]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:37]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:37] Data Regions Configured:
[00:48:37]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:37] Topology snapshot [ver=7, servers=4, clients=3, CPUs=16,
offheap=70.0GB, heap=16.0GB]
[00:48:37]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:37]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:37] Data Regions Configured:
[00:48:37]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:38] Topology snapshot [ver=8, servers=4, clients=4, CPUs=16,
offheap=80.0GB, heap=19.0GB]
[00:48:38]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,
clusterState=ACTIVE]
[00:48:38]   ^-- Baseline [id=0, size=4, online=3, offline=1]
[00:48:38] Data Regions Configured:
[00:48:38]   ^-- default [initSize=256.0 MiB, maxSize=10.0 GiB,
persistenceEnabled=true]
[00:48:40] Topology snapshot [ver=9, servers=4, clients=5, CPUs=16,
offheap=90.0GB, heap=23.0GB]
[00:48:40]   ^-- Node [id=D90D68C6-C725-43F8-BC32-71363FE3E86F,