I'm using Ignite 2.6 with the C# client. I have a running cluster that I was debugging. All requests were read only (there were no state mutating operations running in the cluster.
I terminated the one server node in the grid (running in the debugger) to make a small code change and re-run it (I do this frequently). The node may have been stopped for longer than the partitioning timeout. On re-running the server node it failed to start. On re-running the complete cluster it still failed to start, and all other nodes report failure to connect to a inactive grid. Looking at the log for the server node that is failing I get the following log showing an exception while initializing a WAL segment. This failure seems permanent and is unexpected as we are using the strict WAL atomicity mode (WalMode.Fsync) for all persisted regions.Is this a recoverable error, or does this imply data loss? [NB: This is a dev system so no prod data is affected]] 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer >>> __________ ________________ >>> / _/ ___/ |/ / _/_ __/ __/ >>> _/ // (7 7 // / / / / _/ >>> /___/\___/_/|_/___/ /_/ /___/ >>> >>> ver. 2.6.0#20180710-sha1:669feacc >>> 2018 Copyright(C) Apache Software Foundation >>> >>> Ignite documentation: http://ignite.apache.org 2018-11-29 12:26:09,933 [1] INFO ImmutableCacheComputeServer Config URL: n/a 2018-11-29 12:26:09,948 [1] INFO ImmutableCacheComputeServer IgniteConfiguration [igniteInstanceName=TRex-Immutable, pubPoolSize=50, svcPoolSize=12, callbackPoolSize=12, stripedPoolSize=12, sysPoolSize=12, mgmtPoolSize=4, igfsPoolSize=12, dataStreamerPoolSize=12, utilityCachePoolSize=12, utilityCacheKeepAliveTime=60000, p2pPoolSize=2, qryPoolSize=12, igniteHome=null, igniteWorkDir=C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6e4784bc, nodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73, marsh=org.apache.ignite.internal.binary.BinaryMarshaller@e4487af, marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=5000, sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000, commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null, enableForcibleNodeKill=false, enableTroubleshootingLog=false, srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@10d68fcd, locAddr=127.0.0.1, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=30000, connTimeout=5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=1024, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=16, unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1, boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@117e949d[Count = 1], stopping=false, metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@6db9f5a4], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@5f8edcc5, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@7a675056, addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1, txCfg=org.apache.ignite.configuration.TransactionConfiguration@d21a74c, cacheSanityCheckEnabled=true, discoStartupDelay=60000, deployMode=SHARED, p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=10000, clientFailureDetectionTimeout=30000, metricsLogFreq=10000, hadoopCfg=null, connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@6e509ffa, odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=PlatformDotNetConfiguration [binaryCfg=null], binaryCfg=BinaryConfiguration [idMapper=null, nameMapper=null, serializer=null, compactFooter=true], memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration [sysRegionInitSize=41943040, sysCacheMaxSize=104857600, pageSize=16384, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=Default-Immutable, maxSize=1073741824, initSize=134217728, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=false, metricsSubIntervalCount=5, metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0], storagePath=/persist\TRexIgniteData\Immutable\Persistence, checkpointFreq=180000, lockWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, walSegments=10, walSegmentSize=67108864, walPath=/persist\TRexIgniteData\Immutable\WalStore, walArchivePath=/persist\TRexIgniteData\Immutable\WalArchive, metricsEnabled=false, walMode=FSYNC, walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000, walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@2f465398, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false, walCompactionEnabled=false], activeOnStart=true, autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=null, port=10800, portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=128, threadPoolSize=12, idleTimeout=0, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null], authEnabled=false, failureHnd=null, commFailureRslvr=null] 2018-11-29 12:26:09,949 [1] INFO ImmutableCacheComputeServer Daemon mode: off 2018-11-29 12:26:09,949 [1] INFO ImmutableCacheComputeServer OS: Windows 10 10.0 amd64 2018-11-29 12:26:09,949 [1] INFO ImmutableCacheComputeServer OS user: rwilson 2018-11-29 12:26:09,953 [1] INFO ImmutableCacheComputeServer PID: 7836 2018-11-29 12:26:09,953 [1] INFO ImmutableCacheComputeServer Language runtime: Java Platform API Specification ver. 1.8 2018-11-29 12:26:09,953 [1] INFO ImmutableCacheComputeServer VM information: Java(TM) SE Runtime Environment 1.8.0_191-b12 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.191-b12 2018-11-29 12:26:09,954 [1] INFO ImmutableCacheComputeServer VM total memory: 0.89GB 2018-11-29 12:26:09,954 [1] INFO ImmutableCacheComputeServer Remote Management [restart: off, REST: on, JMX (remote: off)] 2018-11-29 12:26:09,955 [1] INFO ImmutableCacheComputeServer Logger: PlatformLogger [traceEnabled=false, debugEnabled=false, infoEnabled=true, isQuiet=false] 2018-11-29 12:26:09,956 [1] INFO ImmutableCacheComputeServer IGNITE_HOME=null 2018-11-29 12:26:09,956 [1] INFO ImmutableCacheComputeServer VM arguments: [-DIGNITE_QUIET=false, -Djava.net.preferIPv4Stack=true, -Xms512m, -Xmx1024m] 2018-11-29 12:26:09,956 [1] INFO ImmutableCacheComputeServer System cache's DataRegion size is configured to 40 MB. Use DataStorageConfiguration.systemCacheMemorySize property to change the setting. 2018-11-29 12:26:09,956 [1] INFO ImmutableCacheComputeServer Configured caches [in 'sysMemPlc' dataRegion: ['ignite-sys-cache'], in 'Default-Immutable' dataRegion: ['Spatial-Immutable', 'Spatial-Immutable-Compressed']] 2018-11-29 12:26:09,962 [1] INFO ImmutableCacheComputeServer Local node user attribute [Role-PSNode=True] 2018-11-29 12:26:09,962 [1] INFO ImmutableCacheComputeServer Local node user attribute [Owner=TRex-Immutable] 2018-11-29 12:26:09,968 [3] WARN ImmutableCacheComputeServer This operating system has been tested less rigorously: Windows 10 10.0 amd64. Our team will appreciate the feedback if you experience any problems running ignite in this environment. 2018-11-29 12:26:10,025 [1] INFO ImmutableCacheComputeServer Configured plugins: 2018-11-29 12:26:10,025 [1] INFO ImmutableCacheComputeServer ^-- None 2018-11-29 12:26:10,025 [1] INFO ImmutableCacheComputeServer 2018-11-29 12:26:10,026 [1] INFO ImmutableCacheComputeServer Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0]] 2018-11-29 12:26:10,125 [1] INFO ImmutableCacheComputeServer Successfully bound communication NIO server to TCP port [port=47100, locHost=/127.0.0.1, selectorsCnt=4, selectorSpins=0, pairedConn=false] 2018-11-29 12:26:10,143 [1] WARN ImmutableCacheComputeServer Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation) 2018-11-29 12:26:10,165 [1] WARN ImmutableCacheComputeServer Collision resolution is disabled (all jobs will be activated upon arrival). 2018-11-29 12:26:10,165 [1] INFO ImmutableCacheComputeServer Security status [authentication=off, tls/ssl=off] 2018-11-29 12:26:10,191 [1] INFO ImmutableCacheComputeServer Successfully bound to TCP port [port=47500, localHost=127.0.0.1/127.0.0.1, locNodeId=8f32d0a6-539c-40dd-bc42-d044f28bac73] 2018-11-29 12:26:10,206 [1] INFO ImmutableCacheComputeServer Successfully locked persistence storage folder [C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\persist\TRexIgniteData\Immutable\Persistence\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a] 2018-11-29 12:26:10,207 [1] INFO ImmutableCacheComputeServer Consistent ID used for local node is [3cdb19b9-8993-40ba-a170-60d46b28dd8a] according to persistence data storage folders 2018-11-29 12:26:10,213 [1] INFO ImmutableCacheComputeServer Resolved directory for serialized binary metadata: C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\binary_meta\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a 2018-11-29 12:26:10,458 [1] INFO ImmutableCacheComputeServer Resolved page store work directory: C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\persist\TRexIgniteData\Immutable\Persistence\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a 2018-11-29 12:26:10,459 [1] INFO ImmutableCacheComputeServer Resolved write ahead log work directory: C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\persist\TRexIgniteData\Immutable\WalStore\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a 2018-11-29 12:26:10,461 [1] INFO ImmutableCacheComputeServer Resolved write ahead log archive directory: C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\persist\TRexIgniteData\Immutable\WalArchive\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a 2018-11-29 12:26:10,478 [1] ERROR ImmutableCacheComputeServer Exception during start processors, node will be stopped and close connections 2018-11-29 12:26:10,479 [1] ERROR ImmutableCacheComputeServer Got exception while starting (will rollback startup routine). 2018-11-29 12:26:10,480 [1] WARN ImmutableCacheComputeServer Attempt to stop starting grid. This operation cannot be guaranteed to be successful. 2018-11-29 12:26:10,588 [1] INFO ImmutableCacheComputeServer >>> +---------------------------------------------------------------------------------+ >>> Ignite ver. 2.6.0#20180710-sha1:669feacc5d3a4e60efcdd300dc8de99780f38eed stopped OK >>> +---------------------------------------------------------------------------------+ >>> Ignite instance name: TRex-Immutable >>> Grid uptime: 00:00:02.448 Exception during creation of new Ignite node: Apache.Ignite.Core.Common.IgniteException: Failed to start processor: GridProcessorAdapter [] ---> Apache.Ignite.Core.Common.JavaException: class org.apache.ignite.IgniteException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:990) at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:48) at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:75) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1742) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:980) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649) at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:43) ... 1 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to initialize WAL log segment (WAL segment size change is not supported):C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\persist\TRexIgniteData\Immutable\WalStore\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a\0000000000000008.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.checkFiles(FsyncModeFileWriteAheadLogManager.java:1997) at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.checkOrPrepareFiles(FsyncModeFileWriteAheadLogManager.java:1121) at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.start0(FsyncModeFileWriteAheadLogManager.java:348) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1739) ... 7 more at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck() at Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.IgnitionStart(Env env, String cfgPath, String gridName, Boolean clientMode, Boolean userLogger, Int64 igniteId, Boolean redirectConsole) at Apache.Ignite.Core.Ignition.Start(IgniteConfiguration cfg) --- End of inner exception stack trace --- at Apache.Ignite.Core.Ignition.Start(IgniteConfiguration cfg) at VSS.TRex.Servers.Compute.ImmutableCacheComputeServer.StartTRexGridCacheNode() in C:\Dev\VSS.Productivity3D.MonoRepo\src\service\TRex\src\netstandard\VSS.TRex.GridFabric\Servers\Compute\ImmutableCacheComputeServer.cs:line 236 2018-11-29 12:26:10,679 [1] ERROR ImmutableCacheComputeServer Exception during creation of new Ignite node: Apache.Ignite.Core.Common.IgniteException: Failed to start processor: GridProcessorAdapter [] ---> Apache.Ignite.Core.Common.JavaException: class org.apache.ignite.IgniteException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:990) at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:48) at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:75) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1742) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:980) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649) at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:43) ... 1 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to initialize WAL log segment (WAL segment size change is not supported):C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\persist\TRexIgniteData\Immutable\WalStore\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a\0000000000000008.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.checkFiles(FsyncModeFileWriteAheadLogManager.java:1997) at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.checkOrPrepareFiles(FsyncModeFileWriteAheadLogManager.java:1121) at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.start0(FsyncModeFileWriteAheadLogManager.java:348) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1739) ... 7 more at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck() at Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.IgnitionStart(Env env, String cfgPath, String gridName, Boolean clientMode, Boolean userLogger, Int64 igniteId, Boolean redirectConsole) at Apache.Ignite.Core.Ignition.Start(IgniteConfiguration cfg) --- End of inner exception stack trace --- at Apache.Ignite.Core.Ignition.Start(IgniteConfiguration cfg) at VSS.TRex.Servers.Compute.ImmutableCacheComputeServer.StartTRexGridCacheNode() in C:\Dev\VSS.Productivity3D.MonoRepo\src\service\TRex\src\netstandard\VSS.TRex.GridFabric\Servers\Compute\ImmutableCacheComputeServer.cs:line 236 2018-11-29 12:26:10,684 [1] INFO ImmutableCacheComputeServer Completed creation of new Ignite node Unhandled Exception: Apache.Ignite.Core.Common.IgniteException: Failed to start processor: GridProcessorAdapter [] ---> Apache.Ignite.Core.Common.JavaException: class org.apache.ignite.IgniteException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:990) at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:48) at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:75) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1742) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:980) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649) at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:43) ... 1 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to initialize WAL log segment (WAL segment size change is not supported):C:\Users\rwilson\AppData\Local\Temp\TRexIgniteData\Immutable\persist\TRexIgniteData\Immutable\WalStore\node00-3cdb19b9-8993-40ba-a170-60d46b28dd8a\0000000000000008.wal at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.checkFiles(FsyncModeFileWriteAheadLogManager.java:1997) at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.checkOrPrepareFiles(FsyncModeFileWriteAheadLogManager.java:1121) at org.apache.ignite.internal.processors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager.start0(FsyncModeFileWriteAheadLogManager.java:348) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1739) ... 7 more at Apache.Ignite.Core.Impl.Unmanaged.Jni.Env.ExceptionCheck() at Apache.Ignite.Core.Impl.Unmanaged.UnmanagedUtils.IgnitionStart(Env env, String cfgPath, String gridName, Boolean clientMode, Boolean userLogger, Int64 igniteId, Boolean redirectConsole) at Apache.Ignite.Core.Ignition.Start(IgniteConfiguration cfg) --- End of inner exception stack trace --- ......... ......... Nov 29, 2018 12:26:22 PM java.util.logging.LogManager$RootLogger log WARNING: Possible too long JVM pause: 11095 milliseconds. Thanks, Raymond.
