Denis Chudov created IGNITE-15295:
-------------------------------------

             Summary: Server node that has an empty checkpoint 
file-XXX-START.bin does not start
                 Key: IGNITE-15295
                 URL: https://issues.apache.org/jira/browse/IGNITE-15295
             Project: Ignite
          Issue Type: Improvement
            Reporter: Denis Chudov
            Assignee: Denis Chudov


When starting a server node that has an empty checkpoint file-XXX-START.bin 
this node does not start.
{code:java}
2021-06-08 
16:00:33.383[ERROR][Thread-19][o.a.i.i.IgniteKernal%DPL_GRID%DplGridNodeName] 
Exception during start processors, node will be stopped and close connections
2java.nio.BufferUnderflowException: null
3        at java.nio.Buffer.nextGetIndex(Buffer.java:532)
4        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:417)
5        at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readPointer(CheckpointMarkersStorage.java:301)
6        at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readCheckpointStatus(CheckpointMarkersStorage.java:218)
7        at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointManager.readCheckpointStatus(CheckpointManager.java:265)
8        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointStatus(GridCacheDatabaseSharedManager.java:1642)
9        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:584)
10        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:2999)
11        at 
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1205)
12        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2105)
13        at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1768)
14        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1147)
15        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:667)
16        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:593)
17        at org.apache.ignite.Ignition.start(Ignition.java:319)
18        at 
com.sbt.ignite.factory.IgniteFactory.getOrStartIgnite(IgniteFactory.java:139)
19        at 
com.sbt.ignite.factory.IgniteFactory.getOrStartIgnite(IgniteFactory.java:91)
20        at 
com.sbt.ignite.manager.IgniteLifecycleManagerImpl.startIgnite(IgniteLifecycleManagerImpl.java:82)
21        at 
com.sbt.ignite.manager.IgniteLifecycleManagerImpl.init(IgniteLifecycleManagerImpl.java:73)
22        at 
com.sbt.dpl.gridgain.container.DPLManagerLifecycleManager.initIgniteServiceHolder(DPLManagerLifecycleManager.java:170)
23        at 
com.sbt.dpl.gridgain.container.DPLManagerLifecycleManager.dplContextInit(DPLManagerLifecycleManager.java:145)
24        at 
com.sbt.dpl.gridgain.container.ContainerDPLFactory.<init>(ContainerDPLFactory.java:80)
25        at 
com.sbt.dpl.gridgain.springsupport.SpringDPLFactory.init(SpringDPLFactory.java:74)
{code}
Checkpoint marker is always fully written in the temp file first, and then this 
file is renamed (see
{noformat}
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#writeCheckpointEntry(java.nio.ByteBuffer,
 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry,
 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntryType,
 boolean){noformat}
)

So the root cause of this error is not clear, unless file was changed somehow. 
We need extended information if such error will happen in future, but in this 
case we have nothing for analysis (LFS was cleared by the customer right after 
this error happened).

In the same time we can’t guarantee correctness of work when checkpoint markers 
are inconsistent. We can’t just ignore them, if they are broken, and can’t 
recover from previous checkpoint just as simple.

But it seems reasonable to catch all reading-related exceptions in 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#readPointer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to