I am using the default WAL mode. I think its  LOG_ONLY.
The crashed ignite server log is below:

{"type":"log","host":"ignite-cluster-ap-ignite-10","level":"INFO","systemid":"296b639f","system":"ignite-service","time":"2019-01-31
16:01:29,093","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","marker":"","log":"Read
checkpoint status
[startMarker=/opt/ignite/apache-ignite-fabric-2.6.0-bin/persistence/node00-1ed7d92a-a181-4ffb-ad90-df30e3e1fa12/cp/1548909757044-63969238-f350-4b12-bdf5-f7a540021e58-START.bin,
endMarker=/opt/ignite/apache-ignite-fabric-2.6.0-bin/persistence/node00-1ed7d92a-a181-4ffb-ad90-df30e3e1fa12/cp/1548909575263-435715b4-71a9-4c2b-90ef-d831ed575ffc-END.bin]"}

{"type":"log","host":"ignite-cluster-ap-ignite-10","level":"INFO","systemid":"296b639f","system":"ignite-service","time":"2019-01-31
16:01:29,093","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","marker":"","log":"Checking
memory state [lastValidPos=FileWALPointer [idx=412, fileOff=50500521,
len=57801], lastMarked=FileWALPointer [idx=426, fileOff=38038736,
len=57801], lastCheckpointId=63969238-f350-4b12-bdf5-f7a540021e58]"}

{"type":"log","host":"ignite-cluster-ap-ignite-10","level":"WARN","systemid":"296b639f","system":"ignite-service","time":"2019-01-31
16:01:29,094","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","marker":"","log":"Ignite
node stopped in the middle of checkpoint. Will restore memory state and
finish checkpoint on node start."}

{"type":"log","host":"ignite-cluster-ap-ignite-10","level":"ERROR","systemid":"296b639f","system":"ignite-service","time":"2019-01-31
16:01:29,105","logger":"","timezone":"UTC","marker":"","log":"Critical
system error detected. Will be handled accordingly to configured handler
[hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
o.a.i.i.pagemem.wal.StorageException: Failed to restore memory state
(checkpoint marker is present on disk, but checkpoint record is missed in
WAL) [cpStatus=CheckpointStatus [cpStartTs=1548909757044,
cpStartId=63969238-f350-4b12-bdf5-f7a540021e58, startPtr=FileWALPointer
[idx=426, fileOff=38038736, len=57801],
cpEndId=435715b4-71a9-4c2b-90ef-d831ed575ffc, endPtr=FileWALPointer
[idx=412, fileOff=50500521, len=57801]], lastRead=null]]] class
org.apache.ignite.internal.pagemem.wal.StorageException: Failed to restore
memory state (checkpoint marker is present on disk, but checkpoint record
is missed in WAL) [cpStatus=CheckpointStatus [cpStartTs=1548909757044,
cpStartId=63969238-f350-4b12-bdf5-f7a540021e58, startPtr=FileWALPointer
[idx=426, fileOff=38038736, len=57801],
cpEndId=435715b4-71a9-4c2b-90ef-d831ed575ffc, endPtr=FileWALPointer
[idx=412, fileOff=50500521, len=57801]], lastRead=null]

        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:2120)

        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1929)

        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:755)

        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:789)

        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:674)

        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)

        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)

        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)

        at java.lang.Thread.run(Thread.java:748)

"}

{"type":"log","host":"ignite-cluster-ap-ignite-10","level":"ERROR","systemid":"296b639f","system":"ignite-service","time":"2019-01-31
16:01:29,106","logger":"","timezone":"UTC","marker":"","log":"JVM will be
halted immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR,
err=class o.a.i.i.pagemem.wal.StorageException: Failed to restore memory
state (checkpoint marker is present on disk, but checkpoint record is
missed in WAL) [cpStatus=CheckpointStatus [cpStartTs=1548909757044,
cpStartId=63969238-f350-4b12-bdf5-f7a540021e58, startPtr=FileWALPointer
[idx=426, fileOff=38038736, len=57801],
cpEndId=435715b4-71a9-4c2b-90ef-d831ed575ffc, endPtr=FileWALPointer
[idx=412, fileOff=50500521, len=57801]], lastRead=null]]]"}



Regards

Krupa

On Fri, 1 Feb 2019 at 19:50, Ilya Kasnacheev <[email protected]>
wrote:

> Hello!
>
> It's hard to say outright. Can you provide full log before node crash? Is
> there a chance that you ran out of disk space? What's your WALMode?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 1 февр. 2019 г. в 08:16, radha jai <[email protected]>:
>
>> Hi,
>>    Ignite has been deployed on k8s has 12 ignite-servers, which are
>> spread out one on each worker node.  The limits are 1 CPU 32GB RAM, with
>> maximum of 8 CPU and 64GB.  Each ignite-server has a WAL and Persistent
>> storage volume of 30GB.
>>    Getting below error after inserting the 60GB of data to ignite
>> cluster, one of the nodes crashes, and never recovers.  The error on
>> startup indicates that the WAL fails to restore memory state,
>>    type=CRITICAL_ERROR, err=class o.a.i.i.pagemem.wal.StorageException:
>> Failed to restore memory state (checkpoint marker is present on disk, but
>> checkpoint record is missed in WAL)
>>
>> following warning message are seen in some of the server logs.
>>
>> [03:53:53,375][WARNING][jvm-pause-detector-worker][] Possible too long
>> JVM pause: 1022 milliseconds.
>>
>>
>> The snippet of ignite configuration is below:
>>
>>
>> <property name="peerClassLoadingEnabled" value="true"/>
>>
>>  <property name="dataStorageConfiguration">
>>
>>       <bean
>> class="org.apache.ignite.configuration.DataStorageConfiguration">
>>
>>           <!-- Enable metrics for Ignite persistence  -->
>>
>>           <property name="metricsEnabled" value="true"/>
>>
>>           <property name="defaultDataRegionConfiguration">
>>
>>               <bean
>> class="org.apache.ignite.configuration.DataRegionConfiguration">
>>
>>
>>                   <property name="name" value="Default_Region"/>
>>
>>                   <property name="initialSize" value="#{32L * 1024 * 1024
>> * 1024}"/>
>>
>>                   <property name="maxSize" value="#{64L * 1024 * 1024 *
>> 1024}"/>
>>
>>                   <!-- Enabling Apache Ignite Persistent Store. -->
>>
>>                   <property name="persistenceEnabled" value="true"/>
>>
>>                   <!-- Enable metrics for this data region  -->
>>
>>                   <property name="metricsEnabled" value="true"/>
>>
>>               </bean>
>>
>>           </property>
>>
>>           <property name="storagePath" value="/opt/ignite/persistence/"/>
>>
>>           <property name="walPath" value="/opt/ignite/wal/"/>
>>
>>       </bean>
>>
>>   </property>
>>
>>
>> Ignite JVM configuration:  -server -Xms1g -Xmx1g -XX:+AlwaysPreTouch
>> -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
>>
>>
>> Thanks
>>
>> radha
>>
>>
>>
>>

Reply via email to