[jira] [Commented] (IGNITE-8719) Index left partially built if a node crashes during index create or rebuild
[ https://issues.apache.org/jira/browse/IGNITE-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343715#comment-17343715 ] Kirill Tkalenko commented on IGNITE-8719: - [~zstan] Please make code review. > Index left partially built if a node crashes during index create or rebuild > --- > > Key: IGNITE-8719 > URL: https://issues.apache.org/jira/browse/IGNITE-8719 > Project: Ignite > Issue Type: Bug > Components: sql >Reporter: Alexey Goncharuk >Assignee: Kirill Tkalenko >Priority: Critical > Fix For: 2.11 > > Attachments: IndexRebuildAfterNodeCrashTest.java, > IndexRebuildingTest.java > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, we do not have any state associated with the index tree. Consider > the following scenario: > 1) Start node, put some data > 2) start CREATE INDEX operation > 3) Wait for a checkpoint and stop node before index create finished > 4) Restart node > Since the checkpoint finished, the new index tree will be persisted to the > disk, but not all data will be present in the index. > We should somehow store information about initializing index tree and mark it > valid only after all data is indexed. The state should be persisted as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-8719) Index left partially built if a node crashes during index create or rebuild
[ https://issues.apache.org/jira/browse/IGNITE-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343714#comment-17343714 ] Ignite TC Bot commented on IGNITE-8719: --- {panel:title=Branch: [pull/9090/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/9090/head] Base: [master] : New Tests (8)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}PDS (Indexing){color} [[tests 8|https://ci.ignite.apache.org/viewLog.html?buildId=6004413]] * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testTwoNodeRestart - PASSED{color} * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testSingleNodeRestart - PASSED{color} * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testTwoNodeReactivation - PASSED{color} * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testDeleteIndexRebuildStateOnDestroyCache - PASSED{color} * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testSingleNodeReactivation - PASSED{color} * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testNormalFlowIndexRebuildStateStorage - PASSED{color} * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testErrorFlowIndexRebuildStateStorage - PASSED{color} * {color:#013220}IgnitePdsWithIndexingTestSuite: ResumeRebuildIndexTest.testRestartNodeFlowIndexRebuildStateStorage - PASSED{color} {panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6004446buildTypeId=IgniteTests24Java8_RunAll] > Index left partially built if a node crashes during index create or rebuild > --- > > Key: IGNITE-8719 > URL: https://issues.apache.org/jira/browse/IGNITE-8719 > Project: Ignite > Issue Type: Bug > Components: sql >Reporter: Alexey Goncharuk >Assignee: Kirill Tkalenko >Priority: Critical > Fix For: 2.11 > > Attachments: IndexRebuildAfterNodeCrashTest.java, > IndexRebuildingTest.java > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, we do not have any state associated with the index tree. Consider > the following scenario: > 1) Start node, put some data > 2) start CREATE INDEX operation > 3) Wait for a checkpoint and stop node before index create finished > 4) Restart node > Since the checkpoint finished, the new index tree will be persisted to the > disk, but not all data will be present in the index. > We should somehow store information about initializing index tree and mark it > valid only after all data is indexed. The state should be persisted as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14490) cache.invoke() triggers failure handler and freezes if entry processor is not urideployed
[ https://issues.apache.org/jira/browse/IGNITE-14490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343711#comment-17343711 ] Stanilovsky Evgeny commented on IGNITE-14490: - looks good, plz proceed. > cache.invoke() triggers failure handler and freezes if entry processor is not > urideployed > - > > Key: IGNITE-14490 > URL: https://issues.apache.org/jira/browse/IGNITE-14490 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.10 >Reporter: Ilya Kasnacheev >Assignee: Ilya Kasnacheev >Priority: Major > Attachments: LoclerMain.java > > Time Spent: 10m > Remaining Estimate: 0h > > If URI deployment is specified > Caused by: java.lang.ClassNotFoundException: [Ljava.lang.StackTraceElement;" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IGNITE-14702) Inconsistency of the new index when the node falls / deactivates
[ https://issues.apache.org/jira/browse/IGNITE-14702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko reassigned IGNITE-14702: Assignee: Kirill Tkalenko > Inconsistency of the new index when the node falls / deactivates > > > Key: IGNITE-14702 > URL: https://issues.apache.org/jira/browse/IGNITE-14702 > Project: Ignite > Issue Type: Bug > Components: persistence, sql >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.11 > > > At the moment, if we add a new index and in the middle of its construction we > fall / deactivate a node, then it will be inconsistent and may create errors. > Need to either add a new index to DurableBackgroundTask, or add / modify a > separate mechanism that allows to resume the creation of a new index. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IGNITE-14649) Introduce annotation processor for message serializers/deserializers
[ https://issues.apache.org/jira/browse/IGNITE-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev reassigned IGNITE-14649: Assignee: Aleksandr Polovtcev > Introduce annotation processor for message serializers/deserializers > > > Key: IGNITE-14649 > URL: https://issues.apache.org/jira/browse/IGNITE-14649 > Project: Ignite > Issue Type: Sub-task >Reporter: Semyon Danilov >Assignee: Aleksandr Polovtcev >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14713) Calcite. Explain failed for subquery expression.
[ https://issues.apache.org/jira/browse/IGNITE-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanilovsky Evgeny updated IGNITE-14713: Issue Type: Bug (was: Improvement) > Calcite. Explain failed for subquery expression. > > > Key: IGNITE-14713 > URL: https://issues.apache.org/jira/browse/IGNITE-14713 > Project: Ignite > Issue Type: Bug > Components: sql >Reporter: Stanilovsky Evgeny >Priority: Major > Labels: calcite > > Failure test: > {code:java} >@Test > public void test0() throws Exception { > IgniteCache cache = grid(0).getOrCreateCache( > new CacheConfiguration("test") > .setBackups(1) > .setIndexedTypes(Integer.class, Integer.class) > ); > for (int i = 0; i < 100; i++) > cache.put(i, i); > awaitPartitionMapExchange(); > // Correlated INNER join. > checkQuery("explain plan for SELECT t._val FROM \"test\".Integer t > WHERE t._val < 5 AND " + > "t._key in (SELECT x FROM table(system_range(t._val, t._val))) ") > .check(); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14713) Calcite. Explain failed for subquery expression.
Stanilovsky Evgeny created IGNITE-14713: --- Summary: Calcite. Explain failed for subquery expression. Key: IGNITE-14713 URL: https://issues.apache.org/jira/browse/IGNITE-14713 Project: Ignite Issue Type: Improvement Components: sql Reporter: Stanilovsky Evgeny Failure test: {code:java} @Test public void test0() throws Exception { IgniteCache cache = grid(0).getOrCreateCache( new CacheConfiguration("test") .setBackups(1) .setIndexedTypes(Integer.class, Integer.class) ); for (int i = 0; i < 100; i++) cache.put(i, i); awaitPartitionMapExchange(); // Correlated INNER join. checkQuery("explain plan for SELECT t._val FROM \"test\".Integer t WHERE t._val < 5 AND " + "t._key in (SELECT x FROM table(system_range(t._val, t._val))) ") .check(); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14088) Implement scalecube transport API over netty
[ https://issues.apache.org/jira/browse/IGNITE-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343310#comment-17343310 ] Ivan Bessonov commented on IGNITE-14088: Reverted due to flaky failures. [~sdanilov] please fix everything and we'll try one more time, I restored your branch. > Implement scalecube transport API over netty > > > Key: IGNITE-14088 > URL: https://issues.apache.org/jira/browse/IGNITE-14088 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Kalashnikov >Assignee: Semyon Danilov >Priority: Major > Labels: iep-66, ignite-3 > Fix For: 3.0.0-alpha2 > > Time Spent: 16h 40m > Remaining Estimate: 0h > > scalecube has its own netty inside but it is idea to integrate our expanded > netty into it. It will help us to support more features like our own > handshake, marshalling etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (IGNITE-14088) Implement scalecube transport API over netty
[ https://issues.apache.org/jira/browse/IGNITE-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reopened IGNITE-14088: > Implement scalecube transport API over netty > > > Key: IGNITE-14088 > URL: https://issues.apache.org/jira/browse/IGNITE-14088 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Kalashnikov >Assignee: Semyon Danilov >Priority: Major > Labels: iep-66, ignite-3 > Fix For: 3.0.0-alpha2 > > Time Spent: 16h 40m > Remaining Estimate: 0h > > scalecube has its own netty inside but it is idea to integrate our expanded > netty into it. It will help us to support more features like our own > handshake, marshalling etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14712) Fix MetaStorageServiceTest after IGNITE-14088
[ https://issues.apache.org/jira/browse/IGNITE-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-14712: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Fix MetaStorageServiceTest after IGNITE-14088 > -- > > Key: IGNITE-14712 > URL: https://issues.apache.org/jira/browse/IGNITE-14712 > Project: Ignite > Issue Type: Bug >Reporter: Semyon Danilov >Assignee: Semyon Danilov >Priority: Blocker > Labels: ignite-3 > Fix For: 3.0.0-alpha2 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14712) Fix MetaStorageServiceTest after IGNITE-14088
[ https://issues.apache.org/jira/browse/IGNITE-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Semyon Danilov updated IGNITE-14712: Description: IGNITE-14088 introduced ScaleCube transport over direct marshaller, so ScaleCube message serialization factory should be registered if a scalecube cluster is used. > Fix MetaStorageServiceTest after IGNITE-14088 > -- > > Key: IGNITE-14712 > URL: https://issues.apache.org/jira/browse/IGNITE-14712 > Project: Ignite > Issue Type: Bug >Reporter: Semyon Danilov >Assignee: Semyon Danilov >Priority: Blocker > Labels: ignite-3 > Fix For: 3.0.0-alpha2 > > Time Spent: 20m > Remaining Estimate: 0h > > IGNITE-14088 introduced ScaleCube transport over direct marshaller, so > ScaleCube message serialization factory should be registered if a scalecube > cluster is used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14490) cache.invoke() triggers failure handler and freezes if entry processor is not urideployed
[ https://issues.apache.org/jira/browse/IGNITE-14490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343256#comment-17343256 ] Ignite TC Bot commented on IGNITE-14490: {panel:title=Branch: [pull/8976/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/8976/head] Base: [master] : New Tests (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}SPI (URI Deploy){color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6001617]] * {color:#013220}IgniteUriDeploymentTestSuite: UriDeploymentAbsentProcessorClassTest.test - PASSED{color} {panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6001688buildTypeId=IgniteTests24Java8_RunAll] > cache.invoke() triggers failure handler and freezes if entry processor is not > urideployed > - > > Key: IGNITE-14490 > URL: https://issues.apache.org/jira/browse/IGNITE-14490 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.10 >Reporter: Ilya Kasnacheev >Assignee: Ilya Kasnacheev >Priority: Major > Attachments: LoclerMain.java > > Time Spent: 10m > Remaining Estimate: 0h > > If URI deployment is specified > Caused by: java.lang.ClassNotFoundException: [Ljava.lang.StackTraceElement;" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (IGNITE-14490) cache.invoke() triggers failure handler and freezes if entry processor is not urideployed
[ https://issues.apache.org/jira/browse/IGNITE-14490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Kasnacheev updated IGNITE-14490: - Comment: was deleted (was: {panel:title=Branch: [pull/8976/head] Base: [master] : Possible Blockers (8)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}JDBC Driver{color} [[tests 0 Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=6001609]] {color:#d04437}RDD{color} [[tests 0 Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=6001611]] {color:#d04437}Cache 9{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6001650]] * IgniteCacheTestSuite9: TxPartitionCounterStateConsistencyVolatileRebalanceTest.testPartitionConsistencyDuringRebalanceAndConcurrentUpdates_TxDuringPME - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}ZooKeeper (Discovery) 1{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6001623]] * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_ConcurrentMultinode - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Cache 5{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=6001646]] * IgniteCacheTestSuite5: CacheSerializableTransactionsTest.testTxConflictReadEntry1 - Test has low fail rate in base branch 0,0% and is not flaky * IgniteCacheTestSuite5: IgniteCacheAtomicProtocolTest.testFullAsyncPutRemap - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}PDS (Indexing){color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6001655]] * IgnitePdsWithIndexingCoreTestSuite: IgniteLogicalRecoveryWithParamsTest.testPartiallyCommitedTx_TwoNode_WithoutCpOnNodeStop_SingleNodeTx[nodesCnt=2, singleNodeTx=true, backups=0] - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Cache 1{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6001642]] * IgniteBinaryCacheTestSuite: GridCacheStopSelfTest.testStopImplicitMvccTransactionsReplicated - Test has low fail rate in base branch 0,0% and is not flaky {panel} {panel:title=Branch: [pull/8976/head] Base: [master] : New Tests (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}SPI (URI Deploy){color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6001617]] * {color:#013220}IgniteUriDeploymentTestSuite: UriDeploymentAbsentProcessorClassTest.test - PASSED{color} {panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6001688buildTypeId=IgniteTests24Java8_RunAll]) > cache.invoke() triggers failure handler and freezes if entry processor is not > urideployed > - > > Key: IGNITE-14490 > URL: https://issues.apache.org/jira/browse/IGNITE-14490 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.10 >Reporter: Ilya Kasnacheev >Assignee: Ilya Kasnacheev >Priority: Major > Attachments: LoclerMain.java > > Time Spent: 10m > Remaining Estimate: 0h > > If URI deployment is specified > Caused by: java.lang.ClassNotFoundException: [Ljava.lang.StackTraceElement;" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14712) Fix MetaStorageServiceTest after IGNITE-14088
Semyon Danilov created IGNITE-14712: --- Summary: Fix MetaStorageServiceTest after IGNITE-14088 Key: IGNITE-14712 URL: https://issues.apache.org/jira/browse/IGNITE-14712 Project: Ignite Issue Type: Bug Reporter: Semyon Danilov Assignee: Semyon Danilov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IGNITE-14388) Add affinity key support.
[ https://issues.apache.org/jira/browse/IGNITE-14388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Mashenkov reassigned IGNITE-14388: - Assignee: Andrey Mashenkov > Add affinity key support. > - > > Key: IGNITE-14388 > URL: https://issues.apache.org/jira/browse/IGNITE-14388 > Project: Ignite > Issue Type: Improvement >Reporter: Andrey Mashenkov >Assignee: Andrey Mashenkov >Priority: Major > Labels: iep-54, ignite-3 > > For now, we do not calculate Row hash at all, it is always equals zero. > Let's calculate Row hash for affinity columns only while assembling the row > in RowAssembler. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14088) Implement scalecube transport API over netty
[ https://issues.apache.org/jira/browse/IGNITE-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-14088: --- Ignite Flags: Release Notes Required (was: Docs Required,Release Notes Required) > Implement scalecube transport API over netty > > > Key: IGNITE-14088 > URL: https://issues.apache.org/jira/browse/IGNITE-14088 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Kalashnikov >Assignee: Semyon Danilov >Priority: Major > Labels: iep-66, ignite-3 > Fix For: 3.0.0-alpha2 > > Time Spent: 16h 40m > Remaining Estimate: 0h > > scalecube has its own netty inside but it is idea to integrate our expanded > netty into it. It will help us to support more features like our own > handshake, marshalling etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14088) Implement scalecube transport API over netty
[ https://issues.apache.org/jira/browse/IGNITE-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-14088: --- Ignite Flags: (was: Release Notes Required) > Implement scalecube transport API over netty > > > Key: IGNITE-14088 > URL: https://issues.apache.org/jira/browse/IGNITE-14088 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Kalashnikov >Assignee: Semyon Danilov >Priority: Major > Labels: iep-66, ignite-3 > Fix For: 3.0.0-alpha2 > > Time Spent: 16h 40m > Remaining Estimate: 0h > > scalecube has its own netty inside but it is idea to integrate our expanded > netty into it. It will help us to support more features like our own > handshake, marshalling etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14088) Implement scalecube transport API over netty
[ https://issues.apache.org/jira/browse/IGNITE-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343230#comment-17343230 ] Ivan Bessonov commented on IGNITE-14088: [~sdanilov] thank you for contribution, I'll merge your changes. > Implement scalecube transport API over netty > > > Key: IGNITE-14088 > URL: https://issues.apache.org/jira/browse/IGNITE-14088 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Kalashnikov >Assignee: Semyon Danilov >Priority: Major > Labels: iep-66, ignite-3 > Time Spent: 16.5h > Remaining Estimate: 0h > > scalecube has its own netty inside but it is idea to integrate our expanded > netty into it. It will help us to support more features like our own > handshake, marshalling etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14542) Calcite engine. Need to support TableFunctions / SYSTEM_RANGE dynamic table
[ https://issues.apache.org/jira/browse/IGNITE-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343199#comment-17343199 ] Konstantin Orlov commented on IGNITE-14542: --- [~alex_pl], LGTM! > Calcite engine. Need to support TableFunctions / SYSTEM_RANGE dynamic table > --- > > Key: IGNITE-14542 > URL: https://issues.apache.org/jira/browse/IGNITE-14542 > Project: Ignite > Issue Type: Bug > Components: sql >Reporter: Taras Ledkov >Assignee: Aleksey Plekhanov >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > A lot of cases require dynamic range data source. > Tests: > {{aggregate/aggregates/test_perfect_ht.test}} > {{aggregate/aggregates/test_string_agg_many_groups.test_slow}} > {{aggregate/aggregates/test_sum.test}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-8719) Index left partially built if a node crashes during index create or rebuild
[ https://issues.apache.org/jira/browse/IGNITE-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343198#comment-17343198 ] Stanilovsky Evgeny commented on IGNITE-8719: [~ktkale...@gridgain.com] plz check my comments, if you ok with my proposals i would like to review it once more after fixes, thanks ! > Index left partially built if a node crashes during index create or rebuild > --- > > Key: IGNITE-8719 > URL: https://issues.apache.org/jira/browse/IGNITE-8719 > Project: Ignite > Issue Type: Bug > Components: sql >Reporter: Alexey Goncharuk >Assignee: Kirill Tkalenko >Priority: Critical > Fix For: 2.11 > > Attachments: IndexRebuildAfterNodeCrashTest.java, > IndexRebuildingTest.java > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently, we do not have any state associated with the index tree. Consider > the following scenario: > 1) Start node, put some data > 2) start CREATE INDEX operation > 3) Wait for a checkpoint and stop node before index create finished > 4) Restart node > Since the checkpoint finished, the new index tree will be persisted to the > disk, but not all data will be present in the index. > We should somehow store information about initializing index tree and mark it > valid only after all data is indexed. The state should be persisted as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14711) Client discovery thread interrupt/stop causes endless communication reconnect attempt
[ https://issues.apache.org/jira/browse/IGNITE-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Kasnacheev updated IGNITE-14711: - Attachment: IgniteDiscoveryThreadKillingTest.java > Client discovery thread interrupt/stop causes endless communication reconnect > attempt > - > > Key: IGNITE-14711 > URL: https://issues.apache.org/jira/browse/IGNITE-14711 > Project: Ignite > Issue Type: Bug > Components: networking >Affects Versions: 2.10 >Reporter: Ilya Kasnacheev >Priority: Major > Attachments: IgniteDiscoveryThreadKillingTest.java > > > Original issue: if tcp-client-disco-sock-reader thread dies on client node, > it will never disconnect from the cluster despite NODE_FAILED, and will > endlessly try to open communication connections to server while getting > "Remote node does not observe current node in topology" exceptions on client > and "Close incoming connection, unknown node" on server. > Generalized issue: stop()ing or interrupt()ing discovery threads cause > cluster to hang in many cases, where it is expected that any such node will: > * Restart the thread and continue normally > * Disconnect from the cluster to re-establish discovery connection > * Stop and close all remaining threads. > See the attached reproducer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14711) Client discovery thread interrupt/stop causes endless communication reconnect attempt
Ilya Kasnacheev created IGNITE-14711: Summary: Client discovery thread interrupt/stop causes endless communication reconnect attempt Key: IGNITE-14711 URL: https://issues.apache.org/jira/browse/IGNITE-14711 Project: Ignite Issue Type: Bug Components: networking Affects Versions: 2.10 Reporter: Ilya Kasnacheev Original issue: if tcp-client-disco-sock-reader thread dies on client node, it will never disconnect from the cluster despite NODE_FAILED, and will endlessly try to open communication connections to server while getting "Remote node does not observe current node in topology" exceptions on client and "Close incoming connection, unknown node" on server. Generalized issue: stop()ing or interrupt()ing discovery threads cause cluster to hang in many cases, where it is expected that any such node will: * Restart the thread and continue normally * Disconnect from the cluster to re-establish discovery connection * Stop and close all remaining threads. See the attached reproducer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14710) Cache API operations throw IgniteClientDisconnectedException
Ilya Kasnacheev created IGNITE-14710: Summary: Cache API operations throw IgniteClientDisconnectedException Key: IGNITE-14710 URL: https://issues.apache.org/jira/browse/IGNITE-14710 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.10 Reporter: Ilya Kasnacheev Is it possible for Cache.put() operation to throw raw IgniteClientDisconnectedException: {code} class org.apache.ignite.IgniteClientDisconnectedException: Client node disconnected: discovery.IgniteDiscoveryThreadInterruptTest1 at org.apache.ignite.internal.GridKernalGatewayImpl.readLock(GridKernalGatewayImpl.java:93) at org.apache.ignite.internal.IgniteKernal.guard(IgniteKernal.java:4163) at org.apache.ignite.internal.IgniteKernal.transactions(IgniteKernal.java:3173) at org.apache.ignite.internal.processors.cache.GridCacheGateway.checkAtomicOpsInTx(GridCacheGateway.java:363) at org.apache.ignite.internal.processors.cache.GridCacheGateway.onEnter(GridCacheGateway.java:262) at org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:177) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:853) at org.apache.ignite.spi.discovery.IgniteDiscoveryThreadInterruptTest.run(IgniteDiscoveryThreadInterruptTest.java:117) at org.apache.ignite.spi.discovery.IgniteDiscoveryThreadInterruptTest.testStopClientSockWriter(IgniteDiscoveryThreadInterruptTest.java:89) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2428) at java.lang.Thread.run(Thread.java:748) {code} This is incorrect behavior. Usually cache.put() throws only CacheException and it should always throw CacheException with IgniteClientDisconnectedException in getCause(). Currently we are both violating JSR107 and forcing users to handle both CacheException and IgniteClientDisconnectedException, then check the former to see if its cause is the latter, and duplicate future recovery code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IGNITE-14292) Change permissions required to create/destroy caches in GridRestProcessor
[ https://issues.apache.org/jira/browse/IGNITE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergei Ryzhov reassigned IGNITE-14292: -- Assignee: Sergei Ryzhov > Change permissions required to create/destroy caches in GridRestProcessor > - > > Key: IGNITE-14292 > URL: https://issues.apache.org/jira/browse/IGNITE-14292 > Project: Ignite > Issue Type: Improvement > Components: security >Affects Versions: 2.9.1 >Reporter: Andrey Kuznetsov >Assignee: Sergei Ryzhov >Priority: Major > > {{GridRestProcessor}} authorizes {{ADMIN_CACHE}} permission before cache > creation/destruction. This is inconsistent with thin client connector > behavior and looks counterintuitive. {{ADMIN_CACHE}} should be replaced with > {{CACHE_CREATE}} and {{CACHE_DESTROY}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14709) Make ConfigurationStorage#readAll async
Mirza Aliev created IGNITE-14709: Summary: Make ConfigurationStorage#readAll async Key: IGNITE-14709 URL: https://issues.apache.org/jira/browse/IGNITE-14709 Project: Ignite Issue Type: Improvement Reporter: Mirza Aliev Currently, we faced with a problem when a node starts it is hanged on phase when we register DistributedConfigurationStorage. It happens because when ConfigurationChanger#register is run it requires ConfigurationStorage#readAll, this, in turn, calls MetaStorageServiceImpl#range, but the range is waiting on future until cluster init happens, so all process of starting node is hanged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14709) Make ConfigurationStorage#readAll async
[ https://issues.apache.org/jira/browse/IGNITE-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-14709: - Description: Currently, we faced with a problem when a node starts it is hanged on phase when we register {{DistributedConfigurationStorage}}. It happens because when {{ConfigurationChanger#register}} is run it requires {{ConfigurationStorage#readAll}}, this, in turn, calls {{MetaStorageServiceImpl#range}}, but the range is waiting on future until cluster init happens, so all process of starting node is hanged. was: Currently, we faced with a problem when a node starts it is hanged on phase when we register DistributedConfigurationStorage. It happens because when ConfigurationChanger#register is run it requires ConfigurationStorage#readAll, this, in turn, calls MetaStorageServiceImpl#range, but the range is waiting on future until cluster init happens, so all process of starting node is hanged. > Make ConfigurationStorage#readAll async > --- > > Key: IGNITE-14709 > URL: https://issues.apache.org/jira/browse/IGNITE-14709 > Project: Ignite > Issue Type: Improvement >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > Currently, we faced with a problem when a node starts it is hanged on phase > when we register {{DistributedConfigurationStorage}}. It happens because when > {{ConfigurationChanger#register}} is run it requires > {{ConfigurationStorage#readAll}}, this, in turn, calls > {{MetaStorageServiceImpl#range}}, but the range is waiting on future until > cluster init happens, so all process of starting node is hanged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IGNITE-14708) Add an ability to track request handling completion in GridRestProcessor
[ https://issues.apache.org/jira/browse/IGNITE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergei Ryzhov reassigned IGNITE-14708: -- Assignee: Sergei Ryzhov > Add an ability to track request handling completion in GridRestProcessor > > > Key: IGNITE-14708 > URL: https://issues.apache.org/jira/browse/IGNITE-14708 > Project: Ignite > Issue Type: Improvement > Components: rest >Affects Versions: 2.10 >Reporter: Andrey Kuznetsov >Assignee: Sergei Ryzhov >Priority: Major > Fix For: 2.11 > > > It would be useful to have a way to perform some actions when request > handling in {{GridRestProcessor}} is done, either normally or exceptionally. > This will allow thirdparty plugins to add tracking/monitoring behavior to > request processing. > Probable implementation: allow to add custom listener for {{handleAsync0()}} > results. Listener should accept both request and result (response or > exception). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-14708) Add an ability to track request handling completion in GridRestProcessor
Andrey Kuznetsov created IGNITE-14708: - Summary: Add an ability to track request handling completion in GridRestProcessor Key: IGNITE-14708 URL: https://issues.apache.org/jira/browse/IGNITE-14708 Project: Ignite Issue Type: Improvement Components: rest Affects Versions: 2.10 Reporter: Andrey Kuznetsov Fix For: 2.11 It would be useful to have a way to perform some actions when request handling in {{GridRestProcessor}} is done, either normally or exceptionally. This will allow thirdparty plugins to add tracking/monitoring behavior to request processing. Probable implementation: allow to add custom listener for {{handleAsync0()}} results. Listener should accept both request and result (response or exception). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14684) Stopping node at the end of checkpoint can cause "Critical system error"
[ https://issues.apache.org/jira/browse/IGNITE-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-14684: - Release Note: Fixed node fail due to deleting DurableBackgroundTask's at the end of a checkpoint when stopping a node. > Stopping node at the end of checkpoint can cause "Critical system error" > > > Key: IGNITE-14684 > URL: https://issues.apache.org/jira/browse/IGNITE-14684 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maria Makedonskaya >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.11 > > Time Spent: 10m > Remaining Estimate: 0h > > Checkpoint listener > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor#afterCheckpointEnd > which trigger at the end of checkpoint process can not take checkpoint read > lock during node stopping. > Run test(see exception in > log):org.apache.ignite.internal.processors.cache.persistence.db.LongDestroyDurableBackgroundTaskTest#testDestroyTaskLifecycle > {noformat} > [2021-05-05 > 15:41:10,907][ERROR][db-checkpoint-thread-#87%db.LongDestroyDurableBackgroundTaskTest0%][root] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to > perform cache update: node is stopping.]] > class org.apache.ignite.IgniteException: Failed to perform cache update: node > is stopping. > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:127) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1583) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.metaStorageOperation(DurableBackgroundTasksProcessor.java:335) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.afterCheckpointEnd(DurableBackgroundTasksProcessor.java:152) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointEnd(CheckpointWorkflow.java:606) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:479) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:282) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) > at java.lang.Thread.run(Thread.java:748) > Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to > perform cache update: node is stopping. > ... 9 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14684) Stopping node at the end of checkpoint can cause "Critical system error"
[ https://issues.apache.org/jira/browse/IGNITE-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343040#comment-17343040 ] Ivan Bessonov commented on IGNITE-14684: [~ktkale...@gridgain.com] looks good, thank you for the contribution! > Stopping node at the end of checkpoint can cause "Critical system error" > > > Key: IGNITE-14684 > URL: https://issues.apache.org/jira/browse/IGNITE-14684 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maria Makedonskaya >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.11 > > Time Spent: 10m > Remaining Estimate: 0h > > Checkpoint listener > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor#afterCheckpointEnd > which trigger at the end of checkpoint process can not take checkpoint read > lock during node stopping. > Run test(see exception in > log):org.apache.ignite.internal.processors.cache.persistence.db.LongDestroyDurableBackgroundTaskTest#testDestroyTaskLifecycle > {noformat} > [2021-05-05 > 15:41:10,907][ERROR][db-checkpoint-thread-#87%db.LongDestroyDurableBackgroundTaskTest0%][root] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to > perform cache update: node is stopping.]] > class org.apache.ignite.IgniteException: Failed to perform cache update: node > is stopping. > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:127) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1583) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.metaStorageOperation(DurableBackgroundTasksProcessor.java:335) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.afterCheckpointEnd(DurableBackgroundTasksProcessor.java:152) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointEnd(CheckpointWorkflow.java:606) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:479) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:282) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) > at java.lang.Thread.run(Thread.java:748) > Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to > perform cache update: node is stopping. > ... 9 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14684) Stopping node at the end of checkpoint can cause "Critical system error"
[ https://issues.apache.org/jira/browse/IGNITE-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-14684: - Reviewer: Ivan Bessonov > Stopping node at the end of checkpoint can cause "Critical system error" > > > Key: IGNITE-14684 > URL: https://issues.apache.org/jira/browse/IGNITE-14684 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maria Makedonskaya >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.11 > > Time Spent: 10m > Remaining Estimate: 0h > > Checkpoint listener > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor#afterCheckpointEnd > which trigger at the end of checkpoint process can not take checkpoint read > lock during node stopping. > Run test(see exception in > log):org.apache.ignite.internal.processors.cache.persistence.db.LongDestroyDurableBackgroundTaskTest#testDestroyTaskLifecycle > {noformat} > [2021-05-05 > 15:41:10,907][ERROR][db-checkpoint-thread-#87%db.LongDestroyDurableBackgroundTaskTest0%][root] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to > perform cache update: node is stopping.]] > class org.apache.ignite.IgniteException: Failed to perform cache update: node > is stopping. > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:127) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1583) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.metaStorageOperation(DurableBackgroundTasksProcessor.java:335) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.afterCheckpointEnd(DurableBackgroundTasksProcessor.java:152) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointEnd(CheckpointWorkflow.java:606) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:479) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:282) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) > at java.lang.Thread.run(Thread.java:748) > Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to > perform cache update: node is stopping. > ... 9 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-14684) Stopping node at the end of checkpoint can cause "Critical system error"
[ https://issues.apache.org/jira/browse/IGNITE-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343039#comment-17343039 ] Kirill Tkalenko commented on IGNITE-14684: -- [~ibessonov] Please make code review. > Stopping node at the end of checkpoint can cause "Critical system error" > > > Key: IGNITE-14684 > URL: https://issues.apache.org/jira/browse/IGNITE-14684 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maria Makedonskaya >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.11 > > Time Spent: 10m > Remaining Estimate: 0h > > Checkpoint listener > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor#afterCheckpointEnd > which trigger at the end of checkpoint process can not take checkpoint read > lock during node stopping. > Run test(see exception in > log):org.apache.ignite.internal.processors.cache.persistence.db.LongDestroyDurableBackgroundTaskTest#testDestroyTaskLifecycle > {noformat} > [2021-05-05 > 15:41:10,907][ERROR][db-checkpoint-thread-#87%db.LongDestroyDurableBackgroundTaskTest0%][root] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to > perform cache update: node is stopping.]] > class org.apache.ignite.IgniteException: Failed to perform cache update: node > is stopping. > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:127) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1583) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.metaStorageOperation(DurableBackgroundTasksProcessor.java:335) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.afterCheckpointEnd(DurableBackgroundTasksProcessor.java:152) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointEnd(CheckpointWorkflow.java:606) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:479) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:282) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) > at java.lang.Thread.run(Thread.java:748) > Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to > perform cache update: node is stopping. > ... 9 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-14684) Stopping node at the end of checkpoint can cause "Critical system error"
[ https://issues.apache.org/jira/browse/IGNITE-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-14684: - Fix Version/s: 2.11 > Stopping node at the end of checkpoint can cause "Critical system error" > > > Key: IGNITE-14684 > URL: https://issues.apache.org/jira/browse/IGNITE-14684 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maria Makedonskaya >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.11 > > Time Spent: 10m > Remaining Estimate: 0h > > Checkpoint listener > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor#afterCheckpointEnd > which trigger at the end of checkpoint process can not take checkpoint read > lock during node stopping. > Run test(see exception in > log):org.apache.ignite.internal.processors.cache.persistence.db.LongDestroyDurableBackgroundTaskTest#testDestroyTaskLifecycle > {noformat} > [2021-05-05 > 15:41:10,907][ERROR][db-checkpoint-thread-#87%db.LongDestroyDurableBackgroundTaskTest0%][root] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext > [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Failed to > perform cache update: node is stopping.]] > class org.apache.ignite.IgniteException: Failed to perform cache update: node > is stopping. > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:127) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1583) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.metaStorageOperation(DurableBackgroundTasksProcessor.java:335) > at > org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.afterCheckpointEnd(DurableBackgroundTasksProcessor.java:152) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointEnd(CheckpointWorkflow.java:606) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:479) > at > org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:282) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) > at java.lang.Thread.run(Thread.java:748) > Caused by: class org.apache.ignite.internal.NodeStoppingException: Failed to > perform cache update: node is stopping. > ... 9 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)