[jira] [Commented] (IGNITE-17215) Write ClusterSnapshotRecord to WAL
[ https://issues.apache.org/jira/browse/IGNITE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570796#comment-17570796 ] Maksim Timonin commented on IGNITE-17215: - [~xtern] hi! I found the reason, this is actually a bug, introduced in https://issues.apache.org/jira/browse/IGNITE-17272. I prepared a fix in https://issues.apache.org/jira/browse/IGNITE-17408. > Write ClusterSnapshotRecord to WAL > -- > > Key: IGNITE-17215 > URL: https://issues.apache.org/jira/browse/IGNITE-17215 > Project: Ignite > Issue Type: New Feature >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-89 > Fix For: 2.14 > > Time Spent: 50m > Remaining Estimate: 0h > > For PITR [1] process of recovering based on ClusterSnapshot + archived WALs. > It's required to have a point in WAL which splits whole WAL on 2 areas: > # Before this point all data changes are contained within ClusterSnapshot, > and no need to recover them from WAL archived files. > # After this point all data need to be recovered from WAL archived files. > It's proposed to write ClusterSnapshotRecord while the checkpoint is running > (cp#writeLock has acquired). ClusterSnapshot process guarantees: > # there is no active transactions (or any data changes) in moment of running > checkpoint. > # ClusterSnapshot contains all data pages that will be persisted within this > checkpoint process. > Then every logical record after begin CheckointRecord doesn't belong to > ClusterSnapshot. Then it's safe to write ClusterSnapshotRecord within the > checkpoint process. > [1] > [https://cwiki.apache.org/confluence/pages/editpage.action?pageId=211884314] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17215) Write ClusterSnapshotRecord to WAL
[ https://issues.apache.org/jira/browse/IGNITE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569287#comment-17569287 ] Maksim Timonin commented on IGNITE-17215: - Hi [~xtern] thanks for catching that, I'll have a look > Write ClusterSnapshotRecord to WAL > -- > > Key: IGNITE-17215 > URL: https://issues.apache.org/jira/browse/IGNITE-17215 > Project: Ignite > Issue Type: New Feature >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-89 > Fix For: 2.14 > > Time Spent: 50m > Remaining Estimate: 0h > > For PITR [1] process of recovering based on ClusterSnapshot + archived WALs. > It's required to have a point in WAL which splits whole WAL on 2 areas: > # Before this point all data changes are contained within ClusterSnapshot, > and no need to recover them from WAL archived files. > # After this point all data need to be recovered from WAL archived files. > It's proposed to write ClusterSnapshotRecord while the checkpoint is running > (cp#writeLock has acquired). ClusterSnapshot process guarantees: > # there is no active transactions (or any data changes) in moment of running > checkpoint. > # ClusterSnapshot contains all data pages that will be persisted within this > checkpoint process. > Then every logical record after begin CheckointRecord doesn't belong to > ClusterSnapshot. Then it's safe to write ClusterSnapshotRecord within the > checkpoint process. > [1] > [https://cwiki.apache.org/confluence/pages/editpage.action?pageId=211884314] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17215) Write ClusterSnapshotRecord to WAL
[ https://issues.apache.org/jira/browse/IGNITE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565890#comment-17565890 ] Pavel Pereslegin commented on IGNITE-17215: --- [~timonin.maksim], [~NSAmelchev], Could you please take a look why this test started to fail? https://ci2.ignite.apache.org/project.html?projectId=IgniteTests24Java8=2699781523141512829=testDetails_IgniteTests24Java8=%3Cdefault%3E {noformat} lass org.apache.ignite.IgniteException: Failed to read WAL record at position: 54496847, size: 67108669, expectedPtr: WALPointer [idx=0, fileOff=54496847, len=0] at org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:48) at org.apache.ignite.internal.processors.cache.persistence.snapshot.AbstractSnapshotSelfTest.checkSnpWalRecords(AbstractSnapshotSelfTest.java:682) at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManagerSelfTest.testSnapshotLocalPartitionMultiCpWithLoad(IgniteSnapshotManagerSelfTest.java:220) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2428) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at position: 54496847, size: 67108669, expectedPtr: WALPointer [idx=0, fileOff=54496847, len=0] at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecordException(AbstractWalRecordsIterator.java:327) at org.apache.ignite.internal.processors.cache.persistence.wal.reader.StandaloneWalRecordsIterator.handleRecordException(StandaloneWalRecordsIterator.java:389) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:291) at org.apache.ignite.internal.processors.cache.persistence.wal.reader.StandaloneWalRecordsIterator.advanceRecord(StandaloneWalRecordsIterator.java:286) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:176) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:135) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:52) at org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:44) at org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:35) ... 14 more Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at position: 54496847, size: 67108669, expectedPtr: WALPointer [idx=0, fileOff=54496847, len=0] at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:396) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer.readRecord(RecordV2Serializer.java:244) at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:273) ... 20 more Caused by: java.nio.InvalidMarkException at java.base/java.nio.Buffer.reset(Buffer.java:399) at java.base/java.nio.ByteBuffer.reset(ByteBuffer.java:1197) at java.base/java.nio.MappedByteBuffer.reset(MappedByteBuffer.java:253) at java.base/java.nio.MappedByteBuffer.reset(MappedByteBuffer.java:67) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer$1.readWithHeaders(RecordV2Serializer.java:129) at org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:374) ... 22 more
[jira] [Commented] (IGNITE-17215) Write ClusterSnapshotRecord to WAL
[ https://issues.apache.org/jira/browse/IGNITE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564227#comment-17564227 ] Maksim Timonin commented on IGNITE-17215: - [~NSAmelchev] thanks for review. Merged to master > Write ClusterSnapshotRecord to WAL > -- > > Key: IGNITE-17215 > URL: https://issues.apache.org/jira/browse/IGNITE-17215 > Project: Ignite > Issue Type: New Feature >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-89 > Time Spent: 50m > Remaining Estimate: 0h > > For PITR [1] process of recovering based on ClusterSnapshot + archived WALs. > It's required to have a point in WAL which splits whole WAL on 2 areas: > # Before this point all data changes are contained within ClusterSnapshot, > and no need to recover them from WAL archived files. > # After this point all data need to be recovered from WAL archived files. > It's proposed to write ClusterSnapshotRecord while the checkpoint is running > (cp#writeLock has acquired). ClusterSnapshot process guarantees: > # there is no active transactions (or any data changes) in moment of running > checkpoint. > # ClusterSnapshot contains all data pages that will be persisted within this > checkpoint process. > Then every logical record after begin CheckointRecord doesn't belong to > ClusterSnapshot. Then it's safe to write ClusterSnapshotRecord within the > checkpoint process. > [1] > [https://cwiki.apache.org/confluence/pages/editpage.action?pageId=211884314] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17215) Write ClusterSnapshotRecord to WAL
[ https://issues.apache.org/jira/browse/IGNITE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564225#comment-17564225 ] Ignite TC Bot commented on IGNITE-17215: {panel:title=Branch: [pull/10105/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10105/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6672103buildTypeId=IgniteTests24Java8_RunAll] > Write ClusterSnapshotRecord to WAL > -- > > Key: IGNITE-17215 > URL: https://issues.apache.org/jira/browse/IGNITE-17215 > Project: Ignite > Issue Type: New Feature >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-89 > Time Spent: 40m > Remaining Estimate: 0h > > For PITR [1] process of recovering based on ClusterSnapshot + archived WALs. > It's required to have a point in WAL which splits whole WAL on 2 areas: > # Before this point all data changes are contained within ClusterSnapshot, > and no need to recover them from WAL archived files. > # After this point all data need to be recovered from WAL archived files. > It's proposed to write ClusterSnapshotRecord while the checkpoint is running > (cp#writeLock has acquired). ClusterSnapshot process guarantees: > # there is no active transactions (or any data changes) in moment of running > checkpoint. > # ClusterSnapshot contains all data pages that will be persisted within this > checkpoint process. > Then every logical record after begin CheckointRecord doesn't belong to > ClusterSnapshot. Then it's safe to write ClusterSnapshotRecord within the > checkpoint process. > [1] > [https://cwiki.apache.org/confluence/pages/editpage.action?pageId=211884314] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17215) Write ClusterSnapshotRecord to WAL
[ https://issues.apache.org/jira/browse/IGNITE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562486#comment-17562486 ] Ignite TC Bot commented on IGNITE-17215: {panel:title=Branch: [pull/10105/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10105/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6641005buildTypeId=IgniteTests24Java8_RunAll] > Write ClusterSnapshotRecord to WAL > -- > > Key: IGNITE-17215 > URL: https://issues.apache.org/jira/browse/IGNITE-17215 > Project: Ignite > Issue Type: New Feature >Reporter: Maksim Timonin >Assignee: Maksim Timonin >Priority: Major > Labels: IEP-89 > Time Spent: 10m > Remaining Estimate: 0h > > For PITR [1] process of recovering based on ClusterSnapshot + archived WALs. > It's required to have a point in WAL which splits whole WAL on 2 areas: > # Before this point all data changes are contained within ClusterSnapshot, > and no need to recover them from WAL archived files. > # After this point all data need to be recovered from WAL archived files. > It's proposed to write ClusterSnapshotRecord at the moment checkpoint process > starts (cp#writeLock has acquired). ClusterSnapshot process guarantees: > # there is no active transactions (or any data changes) in moment of writing > begin CheckpointRecord. > # ClusterSnapshot consist of data pages that are materialized within this > checkpoint process. > Then every logical record after begin CheckointRecord doesn't belong to > ClusterSnapshot. Then it's safe to write ClusterSnapshotRecord align with > CheckpointRecord. > > [1] > [https://cwiki.apache.org/confluence/pages/editpage.action?pageId=211884314] -- This message was sent by Atlassian Jira (v8.20.10#820010)