[jira] [Updated] (IGNITE-17190) Calcite engine. Unbind statistics from H2
[ https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Daschinsky updated IGNITE-17190: - Release Note: Move SQL statistics to core module Fix SQL ANALYZE, REFRESH STATISTICS and DROP STATISTICS command was: * Move SQL statistics to core module * Fix SQL ANALYZE, REFRESH STATISTICS and DROP STATISTICS command > Calcite engine. Unbind statistics from H2 > - > > Key: IGNITE-17190 > URL: https://issues.apache.org/jira/browse/IGNITE-17190 > Project: Ignite > Issue Type: Improvement >Reporter: Aleksey Plekhanov >Assignee: Ivan Daschinsky >Priority: Major > Labels: calcite, calcite2-required, ignite-osgi > Fix For: 2.14 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently, table statistics in Ignite uses some H2 classes (Value for > example). We should unbind statistics from H2 and move statistics to the core > module to be able to use it in calcite module without dependency to H2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17190) Calcite engine. Unbind statistics from H2
[ https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Daschinsky updated IGNITE-17190: - Release Note: * Move SQL statistics to core module * Fix SQL ANALYZE, REFRESH STATISTICS and DROP STATISTICS command > Calcite engine. Unbind statistics from H2 > - > > Key: IGNITE-17190 > URL: https://issues.apache.org/jira/browse/IGNITE-17190 > Project: Ignite > Issue Type: Improvement >Reporter: Aleksey Plekhanov >Assignee: Ivan Daschinsky >Priority: Major > Labels: calcite, calcite2-required, ignite-osgi > Fix For: 2.14 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently, table statistics in Ignite uses some H2 classes (Value for > example). We should unbind statistics from H2 and move statistics to the core > module to be able to use it in calcite module without dependency to H2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2
[ https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581992#comment-17581992 ] Ivan Daschinsky commented on IGNITE-17190: -- Merged to master, cherry-picked to ignite-2.14 > Calcite engine. Unbind statistics from H2 > - > > Key: IGNITE-17190 > URL: https://issues.apache.org/jira/browse/IGNITE-17190 > Project: Ignite > Issue Type: Improvement >Reporter: Aleksey Plekhanov >Assignee: Ivan Daschinsky >Priority: Major > Labels: calcite, calcite2-required, ignite-osgi > Fix For: 2.14 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently, table statistics in Ignite uses some H2 classes (Value for > example). We should unbind statistics from H2 and move statistics to the core > module to be able to use it in calcite module without dependency to H2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2
[ https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581983#comment-17581983 ] Ivan Daschinsky commented on IGNITE-17190: -- [~alex_pl] Thanks for review > Calcite engine. Unbind statistics from H2 > - > > Key: IGNITE-17190 > URL: https://issues.apache.org/jira/browse/IGNITE-17190 > Project: Ignite > Issue Type: Improvement >Reporter: Aleksey Plekhanov >Assignee: Ivan Daschinsky >Priority: Major > Labels: calcite, calcite2-required, ignite-osgi > Fix For: 2.14 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, table statistics in Ignite uses some H2 classes (Value for > example). We should unbind statistics from H2 and move statistics to the core > module to be able to use it in calcite module without dependency to H2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2
[ https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581982#comment-17581982 ] Ivan Daschinsky commented on IGNITE-17190: -- Two failures are not related to the patch, just tired to rerun tests > Calcite engine. Unbind statistics from H2 > - > > Key: IGNITE-17190 > URL: https://issues.apache.org/jira/browse/IGNITE-17190 > Project: Ignite > Issue Type: Improvement >Reporter: Aleksey Plekhanov >Assignee: Ivan Daschinsky >Priority: Major > Labels: calcite, calcite2-required, ignite-osgi > Fix For: 2.14 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, table statistics in Ignite uses some H2 classes (Value for > example). We should unbind statistics from H2 and move statistics to the core > module to be able to use it in calcite module without dependency to H2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2
[ https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581981#comment-17581981 ] Ignite TC Bot commented on IGNITE-17190: {panel:title=Branch: [pull/10175/head] Base: [master] : Possible Blockers (2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Cache 6{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6739921]] * IgniteCacheTestSuite6: CacheExchangeMergeTest.testMergeServersFail1_1 - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Platform .NET (Windows){color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6739863]] * exe: ClientHeartbeatTest.TestServerDisconnectsIdleClientWithoutHeartbeats - Test has low fail rate in base branch 0,0% and is not flaky {panel} {panel:title=Branch: [pull/10175/head] Base: [master] : New Tests (12)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}Queries 3 (lazy=true){color} [[tests 6|https://ci.ignite.apache.org/viewLog.html?buildId=6739346]] * {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: SqlStatisticsCommandTests.statisticsLexemaTest - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: SqlStatisticsCommandTests.testRefreshNotExistStatistics - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: SqlStatisticsCommandTests.testAnalyze - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: SqlStatisticsCommandTests.testDropNotExistStatistics - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: SqlStatisticsCommandTests.testDropStatistics - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: SqlStatisticsCommandTests.testRefreshStatistics - PASSED{color} {color:#8b}Queries 3{color} [[tests 6|https://ci.ignite.apache.org/viewLog.html?buildId=6739345]] * {color:#013220}IgniteBinaryCacheQueryTestSuite3: SqlStatisticsCommandTests.statisticsLexemaTest - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryTestSuite3: SqlStatisticsCommandTests.testRefreshNotExistStatistics - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryTestSuite3: SqlStatisticsCommandTests.testAnalyze - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryTestSuite3: SqlStatisticsCommandTests.testDropNotExistStatistics - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryTestSuite3: SqlStatisticsCommandTests.testDropStatistics - PASSED{color} * {color:#013220}IgniteBinaryCacheQueryTestSuite3: SqlStatisticsCommandTests.testRefreshStatistics - PASSED{color} {panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6739371buildTypeId=IgniteTests24Java8_RunAll] > Calcite engine. Unbind statistics from H2 > - > > Key: IGNITE-17190 > URL: https://issues.apache.org/jira/browse/IGNITE-17190 > Project: Ignite > Issue Type: Improvement >Reporter: Aleksey Plekhanov >Assignee: Ivan Daschinsky >Priority: Major > Labels: calcite, calcite2-required, ignite-osgi > Fix For: 2.14 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, table statistics in Ignite uses some H2 classes (Value for > example). We should unbind statistics from H2 and move statistics to the core > module to be able to use it in calcite module without dependency to H2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-16136) System Thread pool starvation and out of memory
[ https://issues.apache.org/jira/browse/IGNITE-16136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581965#comment-17581965 ] Maxim Muzafarov commented on IGNITE-16136: -- Folks, I'm able to reproduce the issue. I will fix it soon. > System Thread pool starvation and out of memory > --- > > Key: IGNITE-16136 > URL: https://issues.apache.org/jira/browse/IGNITE-16136 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7.6 >Reporter: David Albrecht >Assignee: Maxim Muzafarov >Priority: Critical > Labels: ise > Fix For: 2.14 > > Attachments: configuration.zip, image-2021-12-15-21-13-43-775.png, > image-2021-12-15-21-17-47-652.png > > Time Spent: 10m > Remaining Estimate: 0h > > We are experiencing thread pool starvations and after some time out of memory > exceptions in some of our ignite client nodes while the server node seems to > be running without any problems. It seems like all sys threads are stuck when > calling MarshallerContextImpl.getClassName. Which in turn leads to a growing > worker queue. > > First warnings regarding the thread pool starvation: > {code:java} > 10.12.21 11:22:34.603 [WARN ] > IgniteKernal.warning(127): Possible thread pool starvation detected (no task > completed in last 3ms, is system thread pool size large enough?) > 10.12.21 11:27:34.654 [WARN ] > IgniteKernal.warning(127): Possible thread pool starvation detected (no task > completed in last 3ms, is system thread pool size large enough?) > 10.12.21 11:32:34.713 [WARN ] > IgniteKernal.warning(127): Possible thread pool starvation detected (no task > completed in last 3ms, is system thread pool size large enough?) > 10.12.21 11:37:34.764 [WARN ] > IgniteKernal.warning(127): Possible thread pool starvation detected (no task > completed in last 3ms, is system thread pool size large enough?) > 10.12.21 11:42:34.796 [WARN ] > IgniteKernal.warning(127): Possible thread pool starvation detected (no task > completed in last 3ms, is system thread pool size large enough?) > 10.12.21 11:47:34.839 [WARN ] > IgniteKernal.warning(127): Possible thread pool starvation detected (no task > completed in last 3ms, is system thread pool size large enough?) > {code} > Out of memory error leading to a crash of the application: > {code} > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "https-openssl-nio-16443-ClientPoller" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "ajp-nio-16009-ClientPoller" > 11-Dec-2021 03:07:24.446 SEVERE [Catalina-utility-1] > org.apache.coyote.AbstractProtocol.startAsyncTimeout Error processing async > timeouts > java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: > Java heap space > {code} > The queue full of messages: > !image-2021-12-15-21-17-47-652.png! > It seems like all sys threads are stuck while waiting at: > {code} > sys-#170 > at jdk.internal.misc.Unsafe.park(ZJ)V (Native Method) > at java.util.concurrent.locks.LockSupport.park()V (LockSupport.java:323) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(Z)Ljava/lang/Object; > (GridFutureAdapter.java:178) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get()Ljava/lang/Object; > (GridFutureAdapter.java:141) > at > org.apache.ignite.internal.MarshallerContextImpl.getClassName(BI)Ljava/lang/String; > (MarshallerContextImpl.java:379) > at > org.apache.ignite.internal.MarshallerContextImpl.getClass(ILjava/lang/ClassLoader;)Ljava/lang/Class; > (MarshallerContextImpl.java:344) > at > org.apache.ignite.internal.marshaller.optimized.OptimizedMarshallerUtils.classDescriptor(Ljava/util/concurrent/ConcurrentMap;ILjava/lang/ClassLoader;Lorg/apache/ignite/marshaller/MarshallerContext;Lorg/apache/ignite/internal/marshaller/optimized/OptimizedMarshallerIdMapper;)Lorg/apache/ignite/internal/marshaller/optimized/OptimizedClassDescriptor; > (OptimizedMarshallerUtils.java:264) > at > org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObject0()Ljava/lang/Object; > (OptimizedObjectInputStream.java:341) > at > org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride()Ljava/lang/Object; > (OptimizedObjectInputStream.java:198) > at > java.io.ObjectInputStream.readObject(Ljava/lang/Class;)Ljava/lang/Object; > (ObjectInputStream.java:484) > at java.io.ObjectInputStream.readObject()Ljava/lang/Object;
[jira] [Commented] (IGNITE-17349) Ignite3 CLI output formatting
[ https://issues.apache.org/jira/browse/IGNITE-17349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581925#comment-17581925 ] Vyacheslav Koptilin commented on IGNITE-17349: -- Hello [~aleksandr.pakhomov], In general, the patch looks good to me. Will check it again and merge. > Ignite3 CLI output formatting > - > > Key: IGNITE-17349 > URL: https://issues.apache.org/jira/browse/IGNITE-17349 > Project: Ignite > Issue Type: Task > Components: cli >Reporter: Aleksandr >Assignee: Aleksandr >Priority: Major > Labels: ignite-3, ignite-3-cli-tool > Time Spent: 1.5h > Remaining Estimate: 0h > > Ignite3 CLI now is not consistent from formatting/styles perspective. > Messages about what went wrong differ from each other. Somewhere 'Done!' is a > marker of successful operation ({{ignite bootstrap}}), somewhere it is just a > sentence notifying that something is done ({{ignite connect}}). Tables are > rendered with different borders for {{ignite bootstrap}}, {{ignite node > list}} and for {{ignite topology}} commands. > The goal of this ticket is to develop user-facing interface components and > use them in the CLI code. The list of the components is also a part of this > ticket but here are some of them: > - problem json render > - table render > - success action render > - suggestion render. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-17477) Redesign RAFT commands in accordance with replication layer
[ https://issues.apache.org/jira/browse/IGNITE-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581917#comment-17581917 ] Vladislav Pyatkov edited comment on IGNITE-17477 at 8/19/22 3:22 PM: - Merged in the TX branch: [ignite3_tx|https://github.com/gridgain/apache-ignite-3/commits/ignite3_tx] b875396e6beda6b85cbb67bf77399663087b49c4 was (Author: v.pyatkov): Merged in the TX branch: b875396e6beda6b85cbb67bf77399663087b49c4 > Redesign RAFT commands in accordance with replication layer > --- > > Key: IGNITE-17477 > URL: https://issues.apache.org/jira/browse/IGNITE-17477 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, transaction3_rw > > After we have implemented a replication layer, a part of the RAFT command are > become useless: _GetAndDeleteCommand, UpsertCommand, GetAllCommand, > GetCommand, DeleteExactCommand_ (the list can be changed) and another one > required modification, because all raft command should apply _rowId_ and > never try to match some row to its id (it is already done by replication > layer). > Also required to extract a primary index (for now, it is a map) from the RAFT > state machine. It will be used by replication layer for read, but in the > state machine will use it for modification only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17477) Redesign RAFT commands in accordance with replication layer
[ https://issues.apache.org/jira/browse/IGNITE-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581917#comment-17581917 ] Vladislav Pyatkov commented on IGNITE-17477: Merged in the TX branch: b875396e6beda6b85cbb67bf77399663087b49c4 > Redesign RAFT commands in accordance with replication layer > --- > > Key: IGNITE-17477 > URL: https://issues.apache.org/jira/browse/IGNITE-17477 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, transaction3_rw > > After we have implemented a replication layer, a part of the RAFT command are > become useless: _GetAndDeleteCommand, UpsertCommand, GetAllCommand, > GetCommand, DeleteExactCommand_ (the list can be changed) and another one > required modification, because all raft command should apply _rowId_ and > never try to match some row to its id (it is already done by replication > layer). > Also required to extract a primary index (for now, it is a map) from the RAFT > state machine. It will be used by replication layer for read, but in the > state machine will use it for modification only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17428) Race between creating table and getting table, between creating schema and getting schema
[ https://issues.apache.org/jira/browse/IGNITE-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581910#comment-17581910 ] Vladislav Pyatkov commented on IGNITE-17428: Merged e2b4205b018d51eea26aada42b1389e92a7fbfb6 [~maliev] Thank you for the contribution. > Race between creating table and getting table, between creating schema and > getting schema > - > > Key: IGNITE-17428 > URL: https://issues.apache.org/jira/browse/IGNITE-17428 > Project: Ignite > Issue Type: Bug >Reporter: Denis Chudov >Assignee: Mirza Aliev >Priority: Major > Labels: ignite-3 > Time Spent: 40m > Remaining Estimate: 0h > > Current version of TableManager#tableAsyncInternal can possibly not detect > table that is being created while tableAsyncInternal is called. Scenario: > - tableAsyncInternal checks tablesByIdVv.latest() and there is no table > - the table creation started, table metadata appears in meta storage > - TableEvent.CREATE is fired > - tableAsyncInternal registers a listener for TableEvent.CREATE (after it is > fired for corresponding table) > - tableAsyncInternal checks tablesByIdVv.latest() once again and there still > is no table, because the table creation is not completed > - {{!isTableConfigured(id)}} condition returns *false* as the table is > present in meta storage > - {{if (tbl != null && getTblFut.complete(tbl) || !isTableConfigured(id) && > getTblFut.complete(null))}} evaluates *false* and the future created fot > getTable never completes. > Possibly we should use VersionedValue#whenComplete instead of creating > listener for event. The table is present in map wrapped in versioned value > only when the table creation is completed, and whenComplete allows to create > a callback to check the table presence. > The same problem is presented for {{SchemaManager}} when we get schema in > {{SchemaManager#tableSchema}} > Possible fix for {{SchemaManager}} is to use this pattern > {code:java} > registriesVv.whenComplete((token, val, e) -> { > if (schemaVer <= val.get(tblId).lastSchemaVersion()) { > fut.complete(getSchemaDescriptorLocally(schemaVer, tblCfg)); > } > }); > {code} > instead of creating listener for CREATE event. The same approach can be used > for {{TableManager}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17477) Redesign RAFT commands in accordance with replication layer
[ https://issues.apache.org/jira/browse/IGNITE-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581863#comment-17581863 ] Sergey Uttsel commented on IGNITE-17477: LGTM > Redesign RAFT commands in accordance with replication layer > --- > > Key: IGNITE-17477 > URL: https://issues.apache.org/jira/browse/IGNITE-17477 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3, transaction3_rw > > After we have implemented a replication layer, a part of the RAFT command are > become useless: _GetAndDeleteCommand, UpsertCommand, GetAllCommand, > GetCommand, DeleteExactCommand_ (the list can be changed) and another one > required modification, because all raft command should apply _rowId_ and > never try to match some row to its id (it is already done by replication > layer). > Also required to extract a primary index (for now, it is a map) from the RAFT > state machine. It will be used by replication layer for read, but in the > state machine will use it for modification only. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17542) Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flaky after IGNITE-17507
[ https://issues.apache.org/jira/browse/IGNITE-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581802#comment-17581802 ] Vyacheslav Koptilin commented on IGNITE-17542: -- Hello [~ivandasch], I have fixed the PR in accordance with your comments. Could you please take a look again? > Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 > became flaky after IGNITE-17507 > > > Key: IGNITE-17542 > URL: https://issues.apache.org/jira/browse/IGNITE-17542 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.14 >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > Fix For: 2.14 > > Time Spent: 40m > Remaining Estimate: 0h > > The test > CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 > became flay due to IGNITE-17507. > The root cause of the issue that _CacheAffinityChangeMessage_ mutates the > message outside the _disco-notifier_ thread, and this fact may lead to the > following exception: > {noformat} > [2022-08-16T21:10:32,133][ERROR][tcp-disco-msg-worker-[0448095b > 127.0.0.1:47502]-#5308%distributed.CacheLateAffinityAssignmentTest3%-#98199%distributed.CacheLateAffinityAssignmentTest3%][TestTcpDiscoverySpi] > TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node > in order to prevent cluster wide instability. > org.apache.ignite.IgniteException: Failed to marshal mutable discovery > message: CacheAffinityChangeMessage > [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, > exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], > partsMsg=GridDhtPartitionsFullMessage [parts=HashMap > {-2100569601=GridDhtPartitionFullMap > {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, > top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, > size=100], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap > [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], > updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap > {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, > top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, > size=1024], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap > [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], > updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], > partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], > partsToReload=IgniteDhtPartitionsToReloadMap [], > topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, > resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage > [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], > lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, > dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, > lastAffChangedTopVer=null, err=null, skipPrepare=false]]], > exchangeNeeded=false, stopProc=false] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6423) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:6243) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3260) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8058) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089) > [classes/:?] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > [classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7989) > [classes/:?] > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) > [classes/:?] > Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize > object: CacheAffinityChangeMessage > [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, > exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], > partsMsg=GridDhtPartitionsFullMessage [parts=HashMap > {-2100569601=GridDhtPartitionFullMap >
[jira] [Updated] (IGNITE-17557) Test ItPublicApiColocationTest hangs
[ https://issues.apache.org/jira/browse/IGNITE-17557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-17557: --- Component/s: sql > Test ItPublicApiColocationTest hangs > > > Key: IGNITE-17557 > URL: https://issues.apache.org/jira/browse/IGNITE-17557 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Periodicaly the test ItPublicApiColocationTest is hangs on createTable. It > simple reproducable on > [TC|https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleRunner?branch=%3Cdefault%3E=overview=builds], > on local machine I can't reproduce it. > Let's investigate the reason anf fix it. > "%ItPublicApiColocationTest_null_0%sql-execution-pool-1" #4948 daemon > prio=5 os_prio=0 cpu=1353.84ms elapsed=3234.89s tid=0x7fe606d95000 > nid=0x33af waiting on condition [0x7fe0c293c000] > java.lang.Thread.State: WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method) > - parking to wait for <0x000741bd4a08> (a > java.util.concurrent.CompletableFuture$Signaller) > at > java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/LockSupport.java:194) > at > java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.8/CompletableFuture.java:1796) > at > java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.8/ForkJoinPool.java:3128) > at > java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.8/CompletableFuture.java:1823) > at > java.util.concurrent.CompletableFuture.join(java.base@11.0.8/CompletableFuture.java:2043) > at > org.apache.ignite.internal.table.distributed.TableManager.join(TableManager.java:1359) > at > org.apache.ignite.internal.table.distributed.TableManager.createTable(TableManager.java:812) > at > org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handleCreateTable(DdlCommandHandler.java:158) > at > org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handle(DdlCommandHandler.java:92) > at > org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executeDdl(ExecutionServiceImpl.java:249) > at > org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executePlan(ExecutionServiceImpl.java:229) > at > org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$query0$13(SqlQueryProcessor.java:424) > at > org.apache.ignite.internal.sql.engine.SqlQueryProcessor$$Lambda$2239/0x0008007f9440.apply(Unknown > Source) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(java.base@11.0.8/CompletableFuture.java:642) > at > java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.8/CompletableFuture.java:506) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(java.base@11.0.8/CompletableFuture.java:1705) > at > org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:80) > at > org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$2242/0x0008009b4840.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.8/Thread.java:834) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-17557) Test ItPublicApiColocationTest hangs
Yury Gerzhedovich created IGNITE-17557: -- Summary: Test ItPublicApiColocationTest hangs Key: IGNITE-17557 URL: https://issues.apache.org/jira/browse/IGNITE-17557 Project: Ignite Issue Type: Improvement Reporter: Yury Gerzhedovich Periodicaly the test ItPublicApiColocationTest is hangs on createTable. It simple reproducable on [TC|https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleRunner?branch=%3Cdefault%3E=overview=builds], on local machine I can't reproduce it. Let's investigate the reason anf fix it. "%ItPublicApiColocationTest_null_0%sql-execution-pool-1" #4948 daemon prio=5 os_prio=0 cpu=1353.84ms elapsed=3234.89s tid=0x7fe606d95000 nid=0x33af waiting on condition [0x7fe0c293c000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method) - parking to wait for <0x000741bd4a08> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/LockSupport.java:194) at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.8/CompletableFuture.java:1796) at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.8/ForkJoinPool.java:3128) at java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.8/CompletableFuture.java:1823) at java.util.concurrent.CompletableFuture.join(java.base@11.0.8/CompletableFuture.java:2043) at org.apache.ignite.internal.table.distributed.TableManager.join(TableManager.java:1359) at org.apache.ignite.internal.table.distributed.TableManager.createTable(TableManager.java:812) at org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handleCreateTable(DdlCommandHandler.java:158) at org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handle(DdlCommandHandler.java:92) at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executeDdl(ExecutionServiceImpl.java:249) at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executePlan(ExecutionServiceImpl.java:229) at org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$query0$13(SqlQueryProcessor.java:424) at org.apache.ignite.internal.sql.engine.SqlQueryProcessor$$Lambda$2239/0x0008007f9440.apply(Unknown Source) at java.util.concurrent.CompletableFuture$UniApply.tryFire(java.base@11.0.8/CompletableFuture.java:642) at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.8/CompletableFuture.java:506) at java.util.concurrent.CompletableFuture$AsyncSupply.run(java.base@11.0.8/CompletableFuture.java:1705) at org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:80) at org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$2242/0x0008009b4840.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.8/Thread.java:834) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-17542) Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flaky after IGNITE-17507
[ https://issues.apache.org/jira/browse/IGNITE-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581776#comment-17581776 ] Ignite TC Bot commented on IGNITE-17542: {panel:title=Branch: [pull/10201/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10201/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6737235buildTypeId=IgniteTests24Java8_RunAll] > Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 > became flaky after IGNITE-17507 > > > Key: IGNITE-17542 > URL: https://issues.apache.org/jira/browse/IGNITE-17542 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.14 >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > Fix For: 2.14 > > Time Spent: 40m > Remaining Estimate: 0h > > The test > CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 > became flay due to IGNITE-17507. > The root cause of the issue that _CacheAffinityChangeMessage_ mutates the > message outside the _disco-notifier_ thread, and this fact may lead to the > following exception: > {noformat} > [2022-08-16T21:10:32,133][ERROR][tcp-disco-msg-worker-[0448095b > 127.0.0.1:47502]-#5308%distributed.CacheLateAffinityAssignmentTest3%-#98199%distributed.CacheLateAffinityAssignmentTest3%][TestTcpDiscoverySpi] > TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node > in order to prevent cluster wide instability. > org.apache.ignite.IgniteException: Failed to marshal mutable discovery > message: CacheAffinityChangeMessage > [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, > exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], > partsMsg=GridDhtPartitionsFullMessage [parts=HashMap > {-2100569601=GridDhtPartitionFullMap > {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, > top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, > size=100], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap > [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], > updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap > {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, > top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, > size=1024], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap > [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], > updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], > partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], > partsToReload=IgniteDhtPartitionsToReloadMap [], > topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, > resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage > [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, > minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], > lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, > dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, > lastAffChangedTopVer=null, err=null, skipPrepare=false]]], > exchangeNeeded=false, stopProc=false] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6423) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:6243) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3260) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8058) > ~[classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089) > [classes/:?] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) > [classes/:?] > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7989) > [classes/:?] > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) > [classes/:?] > Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize > object: CacheAffinityChangeMessage >
[jira] [Assigned] (IGNITE-17498) Update HeapLockManager in order to support Intention locks
[ https://issues.apache.org/jira/browse/IGNITE-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Uttsel reassigned IGNITE-17498: -- Assignee: Sergey Uttsel (was: Denis Chudov) > Update HeapLockManager in order to support Intention locks > -- > > Key: IGNITE-17498 > URL: https://issues.apache.org/jira/browse/IGNITE-17498 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Assignee: Sergey Uttsel >Priority: Major > Labels: ignite-3, transaction3_rw > > It's required to implement new lock upgrade logic that will consider not only > S and X locks but also IS, IX and SIX. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17457) Cluster locks after the transaction recovery procedure if the tx primary node fail
[ https://issues.apache.org/jira/browse/IGNITE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Korotkov updated IGNITE-17457: - Ignite Flags: Release Notes Required > Cluster locks after the transaction recovery procedure if the tx primary node > fail > -- > > Key: IGNITE-17457 > URL: https://issues.apache.org/jira/browse/IGNITE-17457 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Korotkov >Assignee: Sergey Korotkov >Priority: Major > Fix For: 2.14 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Ignite cluster may be locked (all client operations would block) after the tx > recovery procedure executed on the tx near & primary node failure. > The prepared transaction may remain un-commited on the backup node after the > tx recovery. So the partition exchange wouldn't complete. So cluster would > be locked. > > The Immediate reason is the race condition in the method: > {code:java} > org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code} > If 2 or more backups are configured It may be called concurrently for the > same transaction both from the recovery procedure: > {code:java} > IgniteTxManager::commitIfPrepared{code} > and from the tx recovery request handler: > {code:java} > IgniteTxHandler::processCheckPreparedTxRequest{code} > Problem occur if thread context is switched between old finalization status > request and status update. > > The problematic sequence of events is as follows (the lock will be in the > node1): > 1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups. > 2. On node2 start and prepare transaction choosing key with primary partition > stored on node2. > 3. Kill node2 > 4. The tx recovery procedure is started both on node0 and node1 > 5. In scope of the recovery procedure node0 sends tx recovery request to node1 > 6. The following steps are executed on the node1 in two threads ("procedure" > which is a system pool thread executing the tx recovery procedure and > "handler" which is a striped pool thread processing the tx recovery request > sent from node0): > - tx.finalization == NONE > - "procedure": calls markFinalizing(RECOVERY_FINISH) > - "handler": calls markFinalizing(RECOVERY_FINISH) > - "procedure": gets old tx.finlalization - it's NONE > - "handler": gets old tx.finalization - it's NONE > - "handler": updates tx.finalization - now it's RECOVERY_FINISH > - "procedure": tries to update tx.finalization via compareAndSet and fails > since compare fails. > - "procedure": stops transaction processing and does not try to commit it. > - Transaction remains not finished on node1. > > Reproducer is in the pull request. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-17505) Document CREATE INDEX command
[ https://issues.apache.org/jira/browse/IGNITE-17505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Gusev reassigned IGNITE-17505: --- Assignee: Igor Gusev > Document CREATE INDEX command > - > > Key: IGNITE-17505 > URL: https://issues.apache.org/jira/browse/IGNITE-17505 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Igor Gusev >Assignee: Igor Gusev >Priority: Major > Labels: ignite-3 > > In the https://issues.apache.org/jira/browse/IGNITE-17429 ticket, a new > CREATE INDEX command was added. We need to document it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17457) Cluster locks after the transaction recovery procedure if the tx primary node fail
[ https://issues.apache.org/jira/browse/IGNITE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Korotkov updated IGNITE-17457: - Ignite Flags: (was: Release Notes Required) > Cluster locks after the transaction recovery procedure if the tx primary node > fail > -- > > Key: IGNITE-17457 > URL: https://issues.apache.org/jira/browse/IGNITE-17457 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Korotkov >Assignee: Sergey Korotkov >Priority: Major > Fix For: 2.14 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Ignite cluster may be locked (all client operations would block) after the tx > recovery procedure executed on the tx near & primary node failure. > The prepared transaction may remain un-commited on the backup node after the > tx recovery. So the partition exchange wouldn't complete. So cluster would > be locked. > > The Immediate reason is the race condition in the method: > {code:java} > org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code} > If 2 or more backups are configured It may be called concurrently for the > same transaction both from the recovery procedure: > {code:java} > IgniteTxManager::commitIfPrepared{code} > and from the tx recovery request handler: > {code:java} > IgniteTxHandler::processCheckPreparedTxRequest{code} > Problem occur if thread context is switched between old finalization status > request and status update. > > The problematic sequence of events is as follows (the lock will be in the > node1): > 1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups. > 2. On node2 start and prepare transaction choosing key with primary partition > stored on node2. > 3. Kill node2 > 4. The tx recovery procedure is started both on node0 and node1 > 5. In scope of the recovery procedure node0 sends tx recovery request to node1 > 6. The following steps are executed on the node1 in two threads ("procedure" > which is a system pool thread executing the tx recovery procedure and > "handler" which is a striped pool thread processing the tx recovery request > sent from node0): > - tx.finalization == NONE > - "procedure": calls markFinalizing(RECOVERY_FINISH) > - "handler": calls markFinalizing(RECOVERY_FINISH) > - "procedure": gets old tx.finlalization - it's NONE > - "handler": gets old tx.finalization - it's NONE > - "handler": updates tx.finalization - now it's RECOVERY_FINISH > - "procedure": tries to update tx.finalization via compareAndSet and fails > since compare fails. > - "procedure": stops transaction processing and does not try to commit it. > - Transaction remains not finished on node1. > > Reproducer is in the pull request. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17457) Cluster locks after the transaction recovery procedure if the tx primary node fail
[ https://issues.apache.org/jira/browse/IGNITE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Korotkov updated IGNITE-17457: - Release Note: Fixed potential deadlock in transactions recovery on node failure. > Cluster locks after the transaction recovery procedure if the tx primary node > fail > -- > > Key: IGNITE-17457 > URL: https://issues.apache.org/jira/browse/IGNITE-17457 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Korotkov >Assignee: Sergey Korotkov >Priority: Major > Fix For: 2.14 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Ignite cluster may be locked (all client operations would block) after the tx > recovery procedure executed on the tx near & primary node failure. > The prepared transaction may remain un-commited on the backup node after the > tx recovery. So the partition exchange wouldn't complete. So cluster would > be locked. > > The Immediate reason is the race condition in the method: > {code:java} > org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code} > If 2 or more backups are configured It may be called concurrently for the > same transaction both from the recovery procedure: > {code:java} > IgniteTxManager::commitIfPrepared{code} > and from the tx recovery request handler: > {code:java} > IgniteTxHandler::processCheckPreparedTxRequest{code} > Problem occur if thread context is switched between old finalization status > request and status update. > > The problematic sequence of events is as follows (the lock will be in the > node1): > 1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups. > 2. On node2 start and prepare transaction choosing key with primary partition > stored on node2. > 3. Kill node2 > 4. The tx recovery procedure is started both on node0 and node1 > 5. In scope of the recovery procedure node0 sends tx recovery request to node1 > 6. The following steps are executed on the node1 in two threads ("procedure" > which is a system pool thread executing the tx recovery procedure and > "handler" which is a striped pool thread processing the tx recovery request > sent from node0): > - tx.finalization == NONE > - "procedure": calls markFinalizing(RECOVERY_FINISH) > - "handler": calls markFinalizing(RECOVERY_FINISH) > - "procedure": gets old tx.finlalization - it's NONE > - "handler": gets old tx.finalization - it's NONE > - "handler": updates tx.finalization - now it's RECOVERY_FINISH > - "procedure": tries to update tx.finalization via compareAndSet and fails > since compare fails. > - "procedure": stops transaction processing and does not try to commit it. > - Transaction remains not finished on node1. > > Reproducer is in the pull request. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17354) Metrics framework
[ https://issues.apache.org/jira/browse/IGNITE-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17354: - Fix Version/s: 3.0.0-alpha6 > Metrics framework > -- > > Key: IGNITE-17354 > URL: https://issues.apache.org/jira/browse/IGNITE-17354 > Project: Ignite > Issue Type: New Feature >Reporter: Denis Chudov >Assignee: Denis Chudov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-alpha6 > > > *Metrics types* > Metrics framework should provide the following metrics types: > - Gauge - is an instantaneous measurement of a value provided by some > existing component. Gauge should support primitive types: int, long, double > - Metric - is just a wrapper on a numeric value which could be increased or > decreased to some value. Metric should support primitive types: int, long, > double. > - Hit Rate - accumulates approximate hit rate statistics based on hits in the > last time interval. > - Distribution - distributes values by buckets where each bucket represent > some numeric interval (Histogram in AI 2). Internal type - primitive long > (should be enough). > *Concurrency characteristics* > For scalar numeric metrics it is enough to have atomic number (e.g. > AtomicInteger) and striped number (e.g. LongAdder). Such approaches affects > memory footprint and performance differently. > *Design* > Metrics should have the same life cycle as well as component that produces > these metrics. So metrics related to some particular component should be tied > together in MetricsSet. the only purpose of metrics set is provide access to > metrics values from exporters. Metrics instances itself placed in > MetricsSource - an entity which keeps instances of metrics and provides > access to the metrics through an interface that is specific for each metrics > source. A component that produces metrics must control metrics source life > cycle (create it and register in metrics registry, see below). > All metrics sources (it is not important, enabled or disabled metrics for > particular metrics source) must be registered in metrics registry on > component start and removed on component stop. > MetricsSource itself produces an instance of MetricsSet which should be > registered in metrics registry on event "metrics enabled" and unregistered on > event "metrics disabled". > Metrics registry provide an access to all metrics sets from exporters side. > It is possible that metrics registry is overloaded by functionality (manage > by metrics sources and metrics sets), so, probably, special component is need > for these purposes (e.g. metrics manager). > Each instance of metric has a name (local in some metric set) and > description. So the full metric name it is a concatenation of metrics source > name and metric name separated by dot. > For composite metrics like distribution we should treat each metrics inside > (e.g. each range) as separate metric. So the full name for each internal > metric will be metrics source + dot + metric instance name + dot + range as > string (e.g. 0_100). > Metrics set must be immutable in order to meet the requirements described in > the epic. > Data structure (likely map) that is responsible for keeping enabled metrics > set should be modified using copy-on-write semantics in order to avoid data > races between main functionality (metrics enabling\disabling) and exporters. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-17452) [Extensions] Implement Kafka to thin client CDC streamer
[ https://issues.apache.org/jira/browse/IGNITE-17452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amelchev Nikita reassigned IGNITE-17452: Assignee: Amelchev Nikita > [Extensions] Implement Kafka to thin client CDC streamer > > > Key: IGNITE-17452 > URL: https://issues.apache.org/jira/browse/IGNITE-17452 > Project: Ignite > Issue Type: Task >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Labels: IEP-59, ise > > Implement Kafka to thin client CDC streamer -- This message was sent by Atlassian Jira (v8.20.10#820010)