Hi,

You could remove the folder *work/db/wal* and restart the cluster, but
before backup *work/* directory fully.
The workaround is to not apply last changes from WAL, just load the last
savepoint.

2018-04-30 12:01 GMT+03:00 yonggu.lee <[email protected]>:

> Our ignite cluster stuck in an inactive state, cannot be restored from a
> checkpoint.
>
> When cluster is activated, the following exception occurs,
>
> [17:40:54,750][INFO][exchange-worker-#122][GridCacheDatabaseSharedManager]
> Read checkpoint status
> [startMarker=/naver/ignite_storage/20180330/storage/
> node00-698bff11-10c4-4fa9-87bf-07f22714951e/cp/
> 1525070153790-cd46119a-51cd-49af-9ffa-0dccca84fb20-START.bin,
> endMarker=/naver/ignite_storage/20180330/storage/
> node00-698bff11-10c4-4fa9-87bf-07f22714951e/cp/
> 1525070153790-cd46119a-51cd-49af-9ffa-0dccca84fb20-END.bin]
> [17:40:54,750][INFO][exchange-worker-#122][GridCacheDatabaseSharedManager]
> Applying lost cache updates since last checkpoint record
> [lastMarked=FileWALPointer [idx=106922, fileOffset=3457606, len=299101,
> forceFlush=false], lastCheckpointId=cd46119a-51cd-49af-9ffa-0dccca84fb20]
> [17:40:54,818][SEVERE][exchange-worker-#122][
> GridDhtPartitionsExchangeFuture]
> Failed to reinitialize local partitions (preloading will be stopped):
> GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=12,
> minorTopVer=1], discoEvt=DiscoveryCustomEvent
> [customMsg=ChangeGlobalStateMessage
> [id=9a375b51361-acca12ae-d9fb-4e21-a282-3bc7af575257,
> reqId=b3985722-b063-4e5a-831e-9f84d656df96,
> initiatingNodeId=c6e1394e-bf7a-4fe4-a1bf-f64193bd44f4, activate=true],
> affTopVer=AffinityTopologyVersion [topVer=12, minorTopVer=1],
> super=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=c6e1394e-bf7a-4fe4-a1bf-f64193bd44f4, addrs=[10.116.24.222,
> 10.244.5.0,
> 127.0.0.1, 172.17.0.1, 192.168.193.192], sockAddrs=[/10.244.5.0:47500,
> /172.17.0.1:47500, /192.168.193.192:47500, /127.0.0.1:47500,
> /10.116.24.222:47500], discPort=47500, order=3, intOrder=3,
> lastExchangeTime=1525077608394, loc=false, ver=2.3.0#20171220-sha1:
> 8431829c,
> isClient=false], topVer=12, nodeId8=e8f4c909, msg=null,
> type=DISCOVERY_CUSTOM_EVT, tstamp=1525077647980]], nodeId=c6e1394e,
> evt=DISCOVERY_CUSTOM_EVT]
> java.lang.IndexOutOfBoundsException: index 890
>         at
> java.util.concurrent.atomic.AtomicReferenceArray.checkedByteOffset(
> AtomicReferenceArray.java:78)
>         at
> java.util.concurrent.atomic.AtomicReferenceArray.get(
> AtomicReferenceArray.java:125)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.
> GridDhtPartitionTopologyImpl.forceCreatePartition(
> GridDhtPartitionTopologyImpl.java:767)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager
> .java:1777)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.applyLastUpdates(
> GridCacheDatabaseSharedManager.java:1637)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager
> .java:1072)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.beforeExchange(
> GridCacheDatabaseSharedManager.java:863)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.distributedExchange(
> GridDhtPartitionsExchangeFuture.java:1019)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFutur
> e.java:651)
>         at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana
> ger$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.lang.Thread.run(Thread.java:745)
> [17:40:54,818][INFO][exchange-worker-#122][GridDhtPartitionsExchangeFutur
> e]
> Finish exchange future [startVer=AffinityTopologyVersion [topVer=12,
> minorTopVer=1], resVer=null, err=java.lang.IndexOutOfBoundsException:
> index
> 890]
> [17:40:54,830][SEVERE][exchange-worker-#122][
> GridCachePartitionExchangeManager]
> Failed to wait for completion of partition map exchange (preloading will
> not
> start): GridDhtPartitionsExchangeFuture [firstDiscoEvt=
> DiscoveryCustomEvent
> [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=12,
> minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=c6e1394e-bf7a-4fe4-a1bf-f64193bd44f4, addrs=[10.116.24.222,
> 10.244.5.0,
> 127.0.0.1, 172.17.0.1, 192.168.193.192], sockAddrs=[/10.244.5.0:47500,
> /172.17.0.1:47500, /192.168.193.192:47500, /127.0.0.1:47500,
> /10.116.24.222:47500], discPort=47500, order=3, intOrder=3,
> lastExchangeTime=1525077608394, loc=false, ver=2.3.0#20171220-sha1:
> 8431829c,
> isClient=false], topVer=12, nodeId8=e8f4c909, msg=null,
> type=DISCOVERY_CUSTOM_EVT, tstamp=1525077647980]], crd=TcpDiscoveryNode
> [id=8e65440a-df65-4770-9a7b-26672bd574a3, addrs=[10.116.25.32, 10.244.6.0,
> 127.0.0.1, 172.17.0.1, 192.168.82.128], sockAddrs=[/10.244.6.0:47500,
> /10.116.25.32:47500, /172.17.0.1:47500, /192.168.82.128:47500,
> /127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1525077608394, loc=false, ver=2.3.0#20171220-sha1:
> 8431829c,
> isClient=false], exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=1],
> discoEvt=DiscoveryCustomEvent [customMsg=null,
> affTopVer=AffinityTopologyVersion [topVer=12, minorTopVer=1],
> super=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=c6e1394e-bf7a-4fe4-a1bf-f64193bd44f4, addrs=[10.116.24.222,
> 10.244.5.0,
> 127.0.0.1, 172.17.0.1, 192.168.193.192], sockAddrs=[/10.244.5.0:47500,
> /172.17.0.1:47500, /192.168.193.192:47500, /127.0.0.1:47500,
> /10.116.24.222:47500], discPort=47500, order=3, intOrder=3,
> lastExchangeTime=1525077608394, loc=false, ver=2.3.0#20171220-sha1:
> 8431829c,
> isClient=false], topVer=12, nodeId8=e8f4c909, msg=null,
> type=DISCOVERY_CUSTOM_EVT, tstamp=1525077647980]], nodeId=c6e1394e,
> evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter
> [ignoreInterrupts=false, state=DONE, res=false, hash=989374705],
> init=false,
> lastVer=null, partReleaseFut=PartitionReleaseFuture
> [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=1],
> futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion
> [topVer=12, minorTopVer=1], futures=[]], TxReleaseFuture
> [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=1], futures=[]],
> AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=12,
> minorTopVer=1], futures=[]], DataStreamerReleaseFuture
> [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=1], futures=[]]]],
> exchActions=null, affChangeMsg=null, initTs=1525077647990,
> centralizedAff=false, changeGlobalStateE=null, done=true, state=SRV,
> evtLatch=0, remaining=[8e65440a-df65-4770-9a7b-26672bd574a3,
> 18184b4a-0fe0-4fff-a917-a3b03f16a509, c6e1394e-bf7a-4fe4-a1bf-
> f64193bd44f4,
> 491c9af5-e855-42d8-b617-e72bf3099a46, 4aae4b1e-6ef4-43ac-b156-
> f5445adb40c6,
> 91d8036d-c74f-48d7-b389-82ebba96adf2, 95e77e2f-ba25-4c12-b9b0-
> d1b21386eb36,
> c12d30c3-bf9e-4c58-9468-8ef878ec2679, 1edbfd89-a03f-4fea-93b9-
> d058eb93f66b],
> super=GridFutureAdapter [ignoreInterrupts=false, state=DONE,
> res=java.lang.IndexOutOfBoundsException: index 890, hash=328088520]]
> class org.apache.ignite.IgniteCheckedException: index 890
>         at
> org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7252)
>         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.
> resolve(GridFutureAdapter.java:259)
>         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.
> get0(GridFutureAdapter.java:207)
>         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.
> get(GridFutureAdapter.java:159)
>         at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana
> ger$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2289)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IndexOutOfBoundsException: index 890
>         at
> java.util.concurrent.atomic.AtomicReferenceArray.checkedByteOffset(
> AtomicReferenceArray.java:78)
>         at
> java.util.concurrent.atomic.AtomicReferenceArray.get(
> AtomicReferenceArray.java:125)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.
> GridDhtPartitionTopologyImpl.forceCreatePartition(
> GridDhtPartitionTopologyImpl.java:767)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager
> .java:1777)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.applyLastUpdates(
> GridCacheDatabaseSharedManager.java:1637)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager
> .java:1072)
>         at
> org.apache.ignite.internal.processors.cache.persistence.
> GridCacheDatabaseSharedManager.beforeExchange(
> GridCacheDatabaseSharedManager.java:863)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.distributedExchange(
> GridDhtPartitionsExchangeFuture.java:1019)
>         at
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFutur
> e.java:651)
>         at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana
> ger$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2279)
>         ... 2 more
>
> And, I cannot do any operations.
>
> This symptom started to show when I cancelled (Ctrl+C) a service
> deployment.
> At that time, other job was writing to a cache. I just changed the sticky
> parameter of a service deployment (from false to true), and the deployment
> was too slow, so I cancelled it. And then I restarted the cluster, and the
> problem began.
>
> Is there any solution or workaround for this error like skipping the
> checkpoint restoring process, because it's ok for me to lose some recent
> cache updates.
>
> Ignite version is 2.3.0 and config is as follows.
>
> <?xml version="1.0" encoding="UTF-8"?>
>
>
>
> <beans xmlns="http://www.springframework.org/schema/beans";
>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>        xsi:schemaLocation="
>        http://www.springframework.org/schema/beans
>        http://www.springframework.org/schema/beans/spring-beans.xsd";>
>     <bean id="grid.cfg"
> class="org.apache.ignite.configuration.IgniteConfiguration">
>
>
>         <property name="serviceThreadPoolSize" value="80"/>
>
>         <property name="failureDetectionTimeout" value="3600000"/>
>
>
>         <property name="cacheConfiguration">
>             <list>
>
>                 <bean
> class="org.apache.ignite.configuration.CacheConfiguration">
>                     <property name="name"
> value="valid_dup_ratio_cache_name"/>
>                     <property name="atomicityMode" value="ATOMIC"/>
>                     <property name="cacheMode" value="REPLICATED"/>
>                     <property name="indexedTypes">
>                         <list>
>                             <value>java.lang.String</value>
>                             <value>java.util.LinkedList</value>
>                         </list>
>                     </property>
>                 </bean>
>
>
>                 <bean
> class="org.apache.ignite.configuration.CacheConfiguration">
>                     <property name="name"
> value="dup_ratio_hbase_read_through"/>
>                     <property name="atomicityMode" value="ATOMIC"/>
>                     <property name="cacheMode" value="PARTITIONED"/>
>                     <property name="onheapCacheEnabled" value="true"/>
>                     <property name="evictionPolicy">
>                         <bean
> class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
>                             <property name="batchSize" value="5"/>
>
>                         </bean>
>                     </property>
>
>                     <property name="expiryPolicyFactory">
>                         <bean id="expiryPolicy"
> class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
>                             <constructor-arg>
>                                 <bean class="javax.cache.expiry.Duration">
>                                     <constructor-arg value="HOURS"/>
>                                     <constructor-arg value="24"/>
>                                 </bean>
>                             </constructor-arg>
>                         </bean>
>                     </property>
>
>                     <property name="cacheStoreFactory">
>                         <bean
> class="javax.cache.configuration.FactoryBuilder"
> factory-method="factoryOf">
>                             <constructor-arg
> value="com.naver.kweb.serp.title.ignite.read_through.
> HBaseDupRatioAdapter"/>
>                         </bean>
>                     </property>
>                     <property name="readThrough" value="true"/>
>                     <property name="writeThrough" value="false"/>
>                 </bean>
>             </list>
>         </property>
>
>
>         <property name="discoverySpi">
>             <bean
> class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
>                 <property name="clientReconnectDisabled" value="false"/>
>                 <property name="networkTimeout" value="120000"/>
>                 <property name="ipFinder">
>
>                     <bean
> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.
> TcpDiscoveryVmIpFinder">
>                         <property name="addresses">
>                             <list>
>
>
> <value>csb7x0876.nfra.io:47500..47509</value>
>
> <value>csb7x0877.nfra.io:47500..47509</value>
>
> <value>csb7x0878.nfra.io:47500..47509</value>
>
> <value>csb7x0879.nfra.io:47500..47509</value>
>
> <value>csb7x0880.nfra.io:47500..47509</value>
>
> <value>csb7x0881.nfra.io:47500..47509</value>
>
> <value>csb7x0882.nfra.io:47500..47509</value>
>
> <value>csb7x0883.nfra.io:47500..47509</value>
>
> <value>csb7x0884.nfra.io:47500..47509</value>
>
> <value>csb7x0885.nfra.io:47500..47509</value>
>                             </list>
>                         </property>
>                     </bean>
>                 </property>
>             </bean>
>         </property>
>
>
>         <property name="dataStorageConfiguration">
>             <bean
> class="org.apache.ignite.configuration.DataStorageConfiguration">
>
>
>                 <property name="writeThrottlingEnabled" value="true"/>
>
>                 <property name="defaultDataRegionConfiguration">
>                     <bean
> class="org.apache.ignite.configuration.DataRegionConfiguration">
>                         <property name="persistenceEnabled" value="true"/>
>                         <property name="name" value="Default_Region"/>
>                         <property name="maxSize" value="#{75L * 1024 * 1024
> * 1024}"/>
>                         <property name="checkpointPageBufferSize"
> value="#{1024L * 1024 * 1024}"/>
>                         <property name="metricsEnabled" value="true"/>
>                     </bean>
>                 </property>
>                 <property name="storagePath"
> value="/naver/ignite_storage/20180330/storage"/>
>                 <property name="walPath"
> value="/naver/ignite_storage/20180330/wal"/>
>                 <property name="walArchivePath"
> value="/naver/ignite_storage/20180330/walArchive"/>
>                 <property name="metricsEnabled" value="true"/>
>             </bean>
>         </property>
>
>
>         <property name="binaryConfiguration">
>             <bean
> class="org.apache.ignite.configuration.BinaryConfiguration">
>                 <property name="typeConfigurations">
>                     <list>
>                         <bean
> class="org.apache.ignite.binary.BinaryTypeConfiguration">
>                             <property name="typeName"
> value="com.naver.kweb.serp.title.ignite.service.TitleMakerServiceImpl"/>
>                         </bean>
>                     </list>
>                 </property>
>             </bean>
>         </property>
>     </bean>
> </beans>
>
> Thanks.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 

Regards

Pavel Vinokurov

Reply via email to