[jira] [Comment Edited] (IGNITE-19904) Assertion in defragmentation
[ https://issues.apache.org/jira/browse/IGNITE-19904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749563#comment-17749563 ] Vladimir Steshin edited comment on IGNITE-19904 at 8/1/23 11:43 AM: Caused by concurrent default checkpointer which clears shared {code:java} CheckpointProgress#clearCounters() {code} and rises hidden NPE in {code:java} @Override public void CheckpointProgressImpl#updateEvictedPages(int delta) { A.ensure(delta > 0, "param must be positive"); if (evictedPagesCounter() != null) evictedPagesCounter().addAndGet(delta); } {code} while flushing replaced page in `PageMemoryImpl#allocatePage(int grpId, int partId, byte flags)`. See IGNITE-20047 and 'failure_with_root_npe_cause.log'. was (Author: vladsz83): Caused by concurrent default checkpointer which clears shared {code:java} CheckpointProgress#clearCounters() {code} and rises hidden NPE in {code:java} @Override public void CheckpointProgressImpl#updateEvictedPages(int delta) { A.ensure(delta > 0, "param must be positive"); if (evictedPagesCounter() != null) evictedPagesCounter().addAndGet(delta); } {code} while flushing replaced page in `PageMemoryImpl#allocatePage(int grpId, int partId, byte flags)`. See IGNITE-20047. > Assertion in defragmentation > > > Key: IGNITE-19904 > URL: https://issues.apache.org/jira/browse/IGNITE-19904 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.12 >Reporter: Vladimir Steshin >Priority: Major > Labels: ise > Attachments: default-config.xml, failure2.16_with_thread_dump.log, > failure_with_root_npe_cause.log, ignite.log, jvm.opts > > Time Spent: 20m > Remaining Estimate: 0h > > Defragmentaion fails with: > {code:java} > java.lang.AssertionError: Invalid state. Type is 0! pageId = 0001000d00024cbf > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.copyPageForCheckpoint(PageMemoryImpl.java:1359) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1277) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.writePages(CheckpointPagesWriter.java:208) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:150) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > {code} > Difficult to write a test. Can't reproduce on my computers :(. Flackly > appears on a server (4 core x 4 cpu) with 100G of the test cache data and > million+ pages to checkpoint during defragmentation. More often, this occurs > with pageSize 1024 (to produce more pages). > Regarding my diagnostic build, I suppose that a fresh, empty page is caught > in defragmentation. Here is a page dump with test-expented PAGE_OVERHEAD > (=64) and same error a bit before copyPageForCheckpoint(): > {code:java} > org.apache.ignite.IgniteException: Wrong page type in checkpointWritePage1. > Page: Data region = 'defragPartitionsDataRegion'. > FullPageId [pageId=281878703760205, effectivePageId=403727049549, > grpId=-1368047378]. > PageDump = page_id: 281878703760205, rel_id: 48603, cache_id: -1368047378, > pin: 0, lock: 65536, tmp_buf: 72057594037927935, test_val: 1. data_hex: >
[jira] [Comment Edited] (IGNITE-19904) Assertion in defragmentation
[ https://issues.apache.org/jira/browse/IGNITE-19904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749563#comment-17749563 ] Vladimir Steshin edited comment on IGNITE-19904 at 8/1/23 8:36 AM: --- Caused by concurrent default checkpointer which clears shared {code:java} CheckpointProgress#clearCounters() {code} and rises hidden NPE in {code:java} @Override public void CheckpointProgressImpl#updateEvictedPages(int delta) { A.ensure(delta > 0, "param must be positive"); if (evictedPagesCounter() != null) evictedPagesCounter().addAndGet(delta); } {code} while flushing replaced page in `PageMemoryImpl#allocatePage(int grpId, int partId, byte flags)`. See IGNITE-20047. was (Author: vladsz83): Caused by concurrent default checkpointer which clears shared {code:java} CheckpointProgress#clearCounters() {code} and rises hidden NPE in `evictedPagesCounter().`: {code:java} @Override public void CheckpointProgressImpl#updateEvictedPages(int delta) { A.ensure(delta > 0, "param must be positive"); if (evictedPagesCounter() != null) evictedPagesCounter().addAndGet(delta); } {code} while flushing replaced in `PageMemoryImpl#allocatePage(int grpId, int partId, byte flags)` > Assertion in defragmentation > > > Key: IGNITE-19904 > URL: https://issues.apache.org/jira/browse/IGNITE-19904 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.12 >Reporter: Vladimir Steshin >Priority: Major > Labels: ise > Attachments: default-config.xml, failure2.16_with_thread_dump.log, > ignite.log, jvm.opts > > > Defragmentaion fails with: > {code:java} > java.lang.AssertionError: Invalid state. Type is 0! pageId = 0001000d00024cbf > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.copyPageForCheckpoint(PageMemoryImpl.java:1359) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1277) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.writePages(CheckpointPagesWriter.java:208) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:150) > ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] > {code} > Difficult to write a test. Can't reproduce on my computers :(. Flackly > appears on a server (4 core x 4 cpu) with 100G of the test cache data and > million+ pages to checkpoint during defragmentation. More often, this occurs > with pageSize 1024 (to produce more pages). > Regarding my diagnostic build, I suppose that a fresh, empty page is caught > in defragmentation. Here is a page dump with test-expented PAGE_OVERHEAD > (=64) and same error a bit before copyPageForCheckpoint(): > {code:java} > org.apache.ignite.IgniteException: Wrong page type in checkpointWritePage1. > Page: Data region = 'defragPartitionsDataRegion'. > FullPageId [pageId=281878703760205, effectivePageId=403727049549, > grpId=-1368047378]. > PageDump = page_id: 281878703760205, rel_id: 48603, cache_id: -1368047378, > pin: 0, lock: 65536, tmp_buf: 72057594037927935, test_val: 1. data_hex: >