[jira] [Updated] (IGNITE-17190) Calcite engine. Unbind statistics from H2

2022-08-19 Thread Ivan Daschinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Daschinsky updated IGNITE-17190:
-
Release Note: 
Move SQL statistics to core module
Fix SQL ANALYZE, REFRESH STATISTICS and DROP STATISTICS command

  was:
* Move SQL statistics to core module
* Fix SQL ANALYZE, REFRESH STATISTICS and DROP STATISTICS command


> Calcite engine. Unbind statistics from H2
> -
>
> Key: IGNITE-17190
> URL: https://issues.apache.org/jira/browse/IGNITE-17190
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: calcite, calcite2-required, ignite-osgi
> Fix For: 2.14
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, table statistics in Ignite uses some H2 classes (Value for 
> example). We should unbind statistics from H2 and move statistics to the core 
> module to be able to use it in calcite module without dependency to H2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-17190) Calcite engine. Unbind statistics from H2

2022-08-19 Thread Ivan Daschinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Daschinsky updated IGNITE-17190:
-
Release Note: 
* Move SQL statistics to core module
* Fix SQL ANALYZE, REFRESH STATISTICS and DROP STATISTICS command

> Calcite engine. Unbind statistics from H2
> -
>
> Key: IGNITE-17190
> URL: https://issues.apache.org/jira/browse/IGNITE-17190
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: calcite, calcite2-required, ignite-osgi
> Fix For: 2.14
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, table statistics in Ignite uses some H2 classes (Value for 
> example). We should unbind statistics from H2 and move statistics to the core 
> module to be able to use it in calcite module without dependency to H2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2

2022-08-19 Thread Ivan Daschinsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581992#comment-17581992
 ] 

Ivan Daschinsky commented on IGNITE-17190:
--

Merged to master, cherry-picked to ignite-2.14

> Calcite engine. Unbind statistics from H2
> -
>
> Key: IGNITE-17190
> URL: https://issues.apache.org/jira/browse/IGNITE-17190
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: calcite, calcite2-required, ignite-osgi
> Fix For: 2.14
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, table statistics in Ignite uses some H2 classes (Value for 
> example). We should unbind statistics from H2 and move statistics to the core 
> module to be able to use it in calcite module without dependency to H2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2

2022-08-19 Thread Ivan Daschinsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581983#comment-17581983
 ] 

Ivan Daschinsky commented on IGNITE-17190:
--

[~alex_pl] Thanks for review

> Calcite engine. Unbind statistics from H2
> -
>
> Key: IGNITE-17190
> URL: https://issues.apache.org/jira/browse/IGNITE-17190
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: calcite, calcite2-required, ignite-osgi
> Fix For: 2.14
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, table statistics in Ignite uses some H2 classes (Value for 
> example). We should unbind statistics from H2 and move statistics to the core 
> module to be able to use it in calcite module without dependency to H2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2

2022-08-19 Thread Ivan Daschinsky (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581982#comment-17581982
 ] 

Ivan Daschinsky commented on IGNITE-17190:
--

Two failures are not related to the patch, just tired to rerun tests

> Calcite engine. Unbind statistics from H2
> -
>
> Key: IGNITE-17190
> URL: https://issues.apache.org/jira/browse/IGNITE-17190
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: calcite, calcite2-required, ignite-osgi
> Fix For: 2.14
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, table statistics in Ignite uses some H2 classes (Value for 
> example). We should unbind statistics from H2 and move statistics to the core 
> module to be able to use it in calcite module without dependency to H2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17190) Calcite engine. Unbind statistics from H2

2022-08-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581981#comment-17581981
 ] 

Ignite TC Bot commented on IGNITE-17190:


{panel:title=Branch: [pull/10175/head] Base: [master] : Possible Blockers 
(2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Cache 6{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=6739921]]
* IgniteCacheTestSuite6: CacheExchangeMergeTest.testMergeServersFail1_1 - Test 
has low fail rate in base branch 0,0% and is not flaky

{color:#d04437}Platform .NET (Windows){color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=6739863]]
* exe: ClientHeartbeatTest.TestServerDisconnectsIdleClientWithoutHeartbeats - 
Test has low fail rate in base branch 0,0% and is not flaky

{panel}
{panel:title=Branch: [pull/10175/head] Base: [master] : New Tests 
(12)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}Queries 3 (lazy=true){color} [[tests 
6|https://ci.ignite.apache.org/viewLog.html?buildId=6739346]]
* {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: 
SqlStatisticsCommandTests.statisticsLexemaTest - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: 
SqlStatisticsCommandTests.testRefreshNotExistStatistics - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: 
SqlStatisticsCommandTests.testAnalyze - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: 
SqlStatisticsCommandTests.testDropNotExistStatistics - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: 
SqlStatisticsCommandTests.testDropStatistics - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryLazyTestSuite3: 
SqlStatisticsCommandTests.testRefreshStatistics - PASSED{color}

{color:#8b}Queries 3{color} [[tests 
6|https://ci.ignite.apache.org/viewLog.html?buildId=6739345]]
* {color:#013220}IgniteBinaryCacheQueryTestSuite3: 
SqlStatisticsCommandTests.statisticsLexemaTest - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite3: 
SqlStatisticsCommandTests.testRefreshNotExistStatistics - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite3: 
SqlStatisticsCommandTests.testAnalyze - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite3: 
SqlStatisticsCommandTests.testDropNotExistStatistics - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite3: 
SqlStatisticsCommandTests.testDropStatistics - PASSED{color}
* {color:#013220}IgniteBinaryCacheQueryTestSuite3: 
SqlStatisticsCommandTests.testRefreshStatistics - PASSED{color}

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=6739371buildTypeId=IgniteTests24Java8_RunAll]

> Calcite engine. Unbind statistics from H2
> -
>
> Key: IGNITE-17190
> URL: https://issues.apache.org/jira/browse/IGNITE-17190
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Ivan Daschinsky
>Priority: Major
>  Labels: calcite, calcite2-required, ignite-osgi
> Fix For: 2.14
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, table statistics in Ignite uses some H2 classes (Value for 
> example). We should unbind statistics from H2 and move statistics to the core 
> module to be able to use it in calcite module without dependency to H2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-16136) System Thread pool starvation and out of memory

2022-08-19 Thread Maxim Muzafarov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-16136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581965#comment-17581965
 ] 

Maxim Muzafarov commented on IGNITE-16136:
--

Folks, 

I'm able to reproduce the issue. I will fix it soon.

> System Thread pool starvation and out of memory
> ---
>
> Key: IGNITE-16136
> URL: https://issues.apache.org/jira/browse/IGNITE-16136
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7.6
>Reporter: David Albrecht
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: ise
> Fix For: 2.14
>
> Attachments: configuration.zip, image-2021-12-15-21-13-43-775.png, 
> image-2021-12-15-21-17-47-652.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are experiencing thread pool starvations and after some time out of memory 
> exceptions in some of our ignite client nodes while the server node seems to 
> be running without any problems. It seems like all sys threads are stuck when 
> calling MarshallerContextImpl.getClassName. Which in turn leads to a growing 
> worker queue.
>  
> First warnings regarding the thread pool starvation:
> {code:java}
> 10.12.21 11:22:34.603 [WARN ] 
> IgniteKernal.warning(127): Possible thread pool starvation detected (no task 
> completed in last 3ms, is system thread pool size large enough?)
> 10.12.21 11:27:34.654 [WARN ] 
> IgniteKernal.warning(127): Possible thread pool starvation detected (no task 
> completed in last 3ms, is system thread pool size large enough?)
> 10.12.21 11:32:34.713 [WARN ] 
> IgniteKernal.warning(127): Possible thread pool starvation detected (no task 
> completed in last 3ms, is system thread pool size large enough?)
> 10.12.21 11:37:34.764 [WARN ] 
> IgniteKernal.warning(127): Possible thread pool starvation detected (no task 
> completed in last 3ms, is system thread pool size large enough?)
> 10.12.21 11:42:34.796 [WARN ] 
> IgniteKernal.warning(127): Possible thread pool starvation detected (no task 
> completed in last 3ms, is system thread pool size large enough?)
> 10.12.21 11:47:34.839 [WARN ] 
> IgniteKernal.warning(127): Possible thread pool starvation detected (no task 
> completed in last 3ms, is system thread pool size large enough?)
> {code}
> Out of memory error leading to a crash of the application:
> {code}
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "https-openssl-nio-16443-ClientPoller"
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "ajp-nio-16009-ClientPoller"
> 11-Dec-2021 03:07:24.446 SEVERE [Catalina-utility-1] 
> org.apache.coyote.AbstractProtocol.startAsyncTimeout Error processing async 
> timeouts
>   java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: 
> Java heap space
> {code}
> The queue full of messages:
>  !image-2021-12-15-21-17-47-652.png! 
> It seems like all sys threads are stuck while waiting at:
> {code}
> sys-#170
>   at jdk.internal.misc.Unsafe.park(ZJ)V (Native Method)
>   at java.util.concurrent.locks.LockSupport.park()V (LockSupport.java:323)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(Z)Ljava/lang/Object;
>  (GridFutureAdapter.java:178)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get()Ljava/lang/Object;
>  (GridFutureAdapter.java:141)
>   at 
> org.apache.ignite.internal.MarshallerContextImpl.getClassName(BI)Ljava/lang/String;
>  (MarshallerContextImpl.java:379)
>   at 
> org.apache.ignite.internal.MarshallerContextImpl.getClass(ILjava/lang/ClassLoader;)Ljava/lang/Class;
>  (MarshallerContextImpl.java:344)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedMarshallerUtils.classDescriptor(Ljava/util/concurrent/ConcurrentMap;ILjava/lang/ClassLoader;Lorg/apache/ignite/marshaller/MarshallerContext;Lorg/apache/ignite/internal/marshaller/optimized/OptimizedMarshallerIdMapper;)Lorg/apache/ignite/internal/marshaller/optimized/OptimizedClassDescriptor;
>  (OptimizedMarshallerUtils.java:264)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObject0()Ljava/lang/Object;
>  (OptimizedObjectInputStream.java:341)
>   at 
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride()Ljava/lang/Object;
>  (OptimizedObjectInputStream.java:198)
>   at 
> java.io.ObjectInputStream.readObject(Ljava/lang/Class;)Ljava/lang/Object; 
> (ObjectInputStream.java:484)
>   at java.io.ObjectInputStream.readObject()Ljava/lang/Object; 

[jira] [Commented] (IGNITE-17349) Ignite3 CLI output formatting

2022-08-19 Thread Vyacheslav Koptilin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581925#comment-17581925
 ] 

Vyacheslav Koptilin commented on IGNITE-17349:
--

Hello [~aleksandr.pakhomov],

In general, the patch looks good to me. Will check it again and merge.

> Ignite3 CLI output formatting
> -
>
> Key: IGNITE-17349
> URL: https://issues.apache.org/jira/browse/IGNITE-17349
> Project: Ignite
>  Issue Type: Task
>  Components: cli
>Reporter: Aleksandr
>Assignee: Aleksandr
>Priority: Major
>  Labels: ignite-3, ignite-3-cli-tool
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Ignite3 CLI now is not consistent from formatting/styles perspective. 
> Messages about what went wrong differ from each other. Somewhere 'Done!' is a 
> marker of successful operation ({{ignite bootstrap}}), somewhere it is just a 
> sentence notifying that something is done ({{ignite connect}}). Tables are 
> rendered with different borders for {{ignite bootstrap}}, {{ignite node 
> list}} and for {{ignite topology}} commands.
> The goal of this ticket is to develop user-facing interface components and 
> use them in the CLI code. The list of the components is also a part of this 
> ticket but here are some of them:
> - problem json render
> - table render
> - success action render
> - suggestion render.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-17477) Redesign RAFT commands in accordance with replication layer

2022-08-19 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581917#comment-17581917
 ] 

Vladislav Pyatkov edited comment on IGNITE-17477 at 8/19/22 3:22 PM:
-

Merged in the TX branch: 
[ignite3_tx|https://github.com/gridgain/apache-ignite-3/commits/ignite3_tx]
b875396e6beda6b85cbb67bf77399663087b49c4


was (Author: v.pyatkov):
Merged in the TX branch: b875396e6beda6b85cbb67bf77399663087b49c4

> Redesign RAFT commands in accordance with replication layer
> ---
>
> Key: IGNITE-17477
> URL: https://issues.apache.org/jira/browse/IGNITE-17477
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3, transaction3_rw
>
> After we have implemented a replication layer, a part of the RAFT command are 
> become useless: _GetAndDeleteCommand, UpsertCommand, GetAllCommand, 
> GetCommand, DeleteExactCommand_ (the list can be changed) and another one 
> required modification, because all raft command should apply _rowId_ and 
> never try to match some row to its id (it is already done by replication 
> layer).
> Also required to extract a primary index (for now, it is a map) from the RAFT 
> state machine. It will be used by replication layer for read, but in the 
> state machine will use it for modification only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17477) Redesign RAFT commands in accordance with replication layer

2022-08-19 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581917#comment-17581917
 ] 

Vladislav Pyatkov commented on IGNITE-17477:


Merged in the TX branch: b875396e6beda6b85cbb67bf77399663087b49c4

> Redesign RAFT commands in accordance with replication layer
> ---
>
> Key: IGNITE-17477
> URL: https://issues.apache.org/jira/browse/IGNITE-17477
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3, transaction3_rw
>
> After we have implemented a replication layer, a part of the RAFT command are 
> become useless: _GetAndDeleteCommand, UpsertCommand, GetAllCommand, 
> GetCommand, DeleteExactCommand_ (the list can be changed) and another one 
> required modification, because all raft command should apply _rowId_ and 
> never try to match some row to its id (it is already done by replication 
> layer).
> Also required to extract a primary index (for now, it is a map) from the RAFT 
> state machine. It will be used by replication layer for read, but in the 
> state machine will use it for modification only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17428) Race between creating table and getting table, between creating schema and getting schema

2022-08-19 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581910#comment-17581910
 ] 

Vladislav Pyatkov commented on IGNITE-17428:


Merged e2b4205b018d51eea26aada42b1389e92a7fbfb6
[~maliev] Thank you for the contribution.

> Race between creating table and getting table, between creating schema and 
> getting schema
> -
>
> Key: IGNITE-17428
> URL: https://issues.apache.org/jira/browse/IGNITE-17428
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Chudov
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current version of TableManager#tableAsyncInternal can possibly not detect 
> table that is being created while tableAsyncInternal is called. Scenario:
> - tableAsyncInternal checks tablesByIdVv.latest() and there is no table
> - the table creation started, table metadata appears in meta storage
> - TableEvent.CREATE is fired
> - tableAsyncInternal registers a listener for TableEvent.CREATE (after it is 
> fired for corresponding table)
> - tableAsyncInternal checks tablesByIdVv.latest() once again and there still 
> is no table, because the table creation is not completed
> - {{!isTableConfigured(id)}} condition returns *false* as the table is 
> present in meta storage
> - {{if (tbl != null && getTblFut.complete(tbl) || !isTableConfigured(id) && 
> getTblFut.complete(null))}} evaluates *false* and the future created fot 
> getTable never completes.
> Possibly we should use VersionedValue#whenComplete instead of creating 
> listener for event. The table is present in map wrapped in versioned value 
> only when the table creation is completed, and whenComplete allows to create 
> a callback to check the table presence.
> The same problem is presented for {{SchemaManager}} when we get schema in 
> {{SchemaManager#tableSchema}}
> Possible fix for {{SchemaManager}} is to use this pattern 
> {code:java}
> registriesVv.whenComplete((token, val, e) -> {
> if (schemaVer <= val.get(tblId).lastSchemaVersion()) {
> fut.complete(getSchemaDescriptorLocally(schemaVer, tblCfg));
> }
> });
> {code}
> instead of creating listener for CREATE event. The same approach can be used 
> for {{TableManager}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17477) Redesign RAFT commands in accordance with replication layer

2022-08-19 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581863#comment-17581863
 ] 

Sergey Uttsel commented on IGNITE-17477:


LGTM

> Redesign RAFT commands in accordance with replication layer
> ---
>
> Key: IGNITE-17477
> URL: https://issues.apache.org/jira/browse/IGNITE-17477
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3, transaction3_rw
>
> After we have implemented a replication layer, a part of the RAFT command are 
> become useless: _GetAndDeleteCommand, UpsertCommand, GetAllCommand, 
> GetCommand, DeleteExactCommand_ (the list can be changed) and another one 
> required modification, because all raft command should apply _rowId_ and 
> never try to match some row to its id (it is already done by replication 
> layer).
> Also required to extract a primary index (for now, it is a map) from the RAFT 
> state machine. It will be used by replication layer for read, but in the 
> state machine will use it for modification only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17542) Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flaky after IGNITE-17507

2022-08-19 Thread Vyacheslav Koptilin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581802#comment-17581802
 ] 

Vyacheslav Koptilin commented on IGNITE-17542:
--

Hello [~ivandasch],

I have fixed the PR in accordance with your comments. Could you please take a 
look again?

> Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 
> became flaky after IGNITE-17507
> 
>
> Key: IGNITE-17542
> URL: https://issues.apache.org/jira/browse/IGNITE-17542
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.14
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.14
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test 
> CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 
> became flay due to IGNITE-17507.
> The root cause of the issue that _CacheAffinityChangeMessage_ mutates the 
> message outside the _disco-notifier_ thread, and this fact may lead to the 
> following exception:
> {noformat}
> [2022-08-16T21:10:32,133][ERROR][tcp-disco-msg-worker-[0448095b 
> 127.0.0.1:47502]-#5308%distributed.CacheLateAffinityAssignmentTest3%-#98199%distributed.CacheLateAffinityAssignmentTest3%][TestTcpDiscoverySpi]
>  TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node 
> in order to prevent cluster wide instability.
>   org.apache.ignite.IgniteException: Failed to marshal mutable discovery 
> message: CacheAffinityChangeMessage 
> [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, 
> exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, 
> minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], 
> partsMsg=GridDhtPartitionsFullMessage [parts=HashMap 
> {-2100569601=GridDhtPartitionFullMap 
> {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, 
> top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, 
> size=100], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap 
> [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], 
> updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap 
> {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, 
> top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, 
> size=1024], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap 
> [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], 
> partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], 
> partsToReload=IgniteDhtPartitionsToReloadMap [], 
> topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, 
> resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage 
> [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, 
> minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], 
> lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, 
> dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, 
> lastAffChangedTopVer=null, err=null, skipPrepare=false]]], 
> exchangeNeeded=false, stopProc=false]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6423)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:6243)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3260)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8058)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089)
>  [classes/:?]
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) 
> [classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7989)
>  [classes/:?]
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) 
> [classes/:?]
>   Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize 
> object: CacheAffinityChangeMessage 
> [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, 
> exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, 
> minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], 
> partsMsg=GridDhtPartitionsFullMessage [parts=HashMap 
> {-2100569601=GridDhtPartitionFullMap 
> 

[jira] [Updated] (IGNITE-17557) Test ItPublicApiColocationTest hangs

2022-08-19 Thread Yury Gerzhedovich (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yury Gerzhedovich updated IGNITE-17557:
---
Component/s: sql

> Test ItPublicApiColocationTest hangs
> 
>
> Key: IGNITE-17557
> URL: https://issues.apache.org/jira/browse/IGNITE-17557
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Yury Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>
> Periodicaly the test ItPublicApiColocationTest is hangs on createTable. It 
> simple reproducable on 
> [TC|https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleRunner?branch=%3Cdefault%3E=overview=builds],
>  on local machine I can't reproduce it.
> Let's investigate the reason anf fix it.
>   "%ItPublicApiColocationTest_null_0%sql-execution-pool-1" #4948 daemon 
> prio=5 os_prio=0 cpu=1353.84ms elapsed=3234.89s tid=0x7fe606d95000 
> nid=0x33af waiting on condition  [0x7fe0c293c000]
>      java.lang.Thread.State: WAITING (parking)
>     at jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method)
>     - parking to wait for  <0x000741bd4a08> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>     at 
> java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/LockSupport.java:194)
>     at 
> java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.8/CompletableFuture.java:1796)
>     at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.8/ForkJoinPool.java:3128)
>     at 
> java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.8/CompletableFuture.java:1823)
>     at 
> java.util.concurrent.CompletableFuture.join(java.base@11.0.8/CompletableFuture.java:2043)
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.join(TableManager.java:1359)
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.createTable(TableManager.java:812)
>     at 
> org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handleCreateTable(DdlCommandHandler.java:158)
>     at 
> org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handle(DdlCommandHandler.java:92)
>     at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executeDdl(ExecutionServiceImpl.java:249)
>     at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executePlan(ExecutionServiceImpl.java:229)
>     at 
> org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$query0$13(SqlQueryProcessor.java:424)
>     at 
> org.apache.ignite.internal.sql.engine.SqlQueryProcessor$$Lambda$2239/0x0008007f9440.apply(Unknown
>  Source)
>     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(java.base@11.0.8/CompletableFuture.java:642)
>     at 
> java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.8/CompletableFuture.java:506)
>     at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(java.base@11.0.8/CompletableFuture.java:1705)
>     at 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:80)
>     at 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$2242/0x0008009b4840.run(Unknown
>  Source)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628)
>     at java.lang.Thread.run(java.base@11.0.8/Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-17557) Test ItPublicApiColocationTest hangs

2022-08-19 Thread Yury Gerzhedovich (Jira)
Yury Gerzhedovich created IGNITE-17557:
--

 Summary: Test ItPublicApiColocationTest hangs
 Key: IGNITE-17557
 URL: https://issues.apache.org/jira/browse/IGNITE-17557
 Project: Ignite
  Issue Type: Improvement
Reporter: Yury Gerzhedovich


Periodicaly the test ItPublicApiColocationTest is hangs on createTable. It 
simple reproducable on 
[TC|https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleRunner?branch=%3Cdefault%3E=overview=builds],
 on local machine I can't reproduce it.

Let's investigate the reason anf fix it.


  "%ItPublicApiColocationTest_null_0%sql-execution-pool-1" #4948 daemon prio=5 
os_prio=0 cpu=1353.84ms elapsed=3234.89s tid=0x7fe606d95000 nid=0x33af 
waiting on condition  [0x7fe0c293c000]
     java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method)
    - parking to wait for  <0x000741bd4a08> (a 
java.util.concurrent.CompletableFuture$Signaller)
    at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.8/LockSupport.java:194)
    at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.8/CompletableFuture.java:1796)
    at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.8/ForkJoinPool.java:3128)
    at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.8/CompletableFuture.java:1823)
    at 
java.util.concurrent.CompletableFuture.join(java.base@11.0.8/CompletableFuture.java:2043)
    at 
org.apache.ignite.internal.table.distributed.TableManager.join(TableManager.java:1359)
    at 
org.apache.ignite.internal.table.distributed.TableManager.createTable(TableManager.java:812)
    at 
org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handleCreateTable(DdlCommandHandler.java:158)
    at 
org.apache.ignite.internal.sql.engine.exec.ddl.DdlCommandHandler.handle(DdlCommandHandler.java:92)
    at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executeDdl(ExecutionServiceImpl.java:249)
    at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.executePlan(ExecutionServiceImpl.java:229)
    at 
org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$query0$13(SqlQueryProcessor.java:424)
    at 
org.apache.ignite.internal.sql.engine.SqlQueryProcessor$$Lambda$2239/0x0008007f9440.apply(Unknown
 Source)
    at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(java.base@11.0.8/CompletableFuture.java:642)
    at 
java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.8/CompletableFuture.java:506)
    at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(java.base@11.0.8/CompletableFuture.java:1705)
    at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:80)
    at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$2242/0x0008009b4840.run(Unknown
 Source)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(java.base@11.0.8/Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-17542) Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flaky after IGNITE-17507

2022-08-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581776#comment-17581776
 ] 

Ignite TC Bot commented on IGNITE-17542:


{panel:title=Branch: [pull/10201/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10201/head] Base: [master] : No new tests 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=6737235buildTypeId=IgniteTests24Java8_RunAll]

> Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 
> became flaky after IGNITE-17507
> 
>
> Key: IGNITE-17542
> URL: https://issues.apache.org/jira/browse/IGNITE-17542
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.14
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
> Fix For: 2.14
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test 
> CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 
> became flay due to IGNITE-17507.
> The root cause of the issue that _CacheAffinityChangeMessage_ mutates the 
> message outside the _disco-notifier_ thread, and this fact may lead to the 
> following exception:
> {noformat}
> [2022-08-16T21:10:32,133][ERROR][tcp-disco-msg-worker-[0448095b 
> 127.0.0.1:47502]-#5308%distributed.CacheLateAffinityAssignmentTest3%-#98199%distributed.CacheLateAffinityAssignmentTest3%][TestTcpDiscoverySpi]
>  TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node 
> in order to prevent cluster wide instability.
>   org.apache.ignite.IgniteException: Failed to marshal mutable discovery 
> message: CacheAffinityChangeMessage 
> [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, 
> exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, 
> minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], 
> partsMsg=GridDhtPartitionsFullMessage [parts=HashMap 
> {-2100569601=GridDhtPartitionFullMap 
> {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, 
> top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, 
> size=100], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap 
> [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], 
> updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap 
> {f57cbb85-44ba-40d1-814e-937f96c3=GridDhtPartitionMap [moving=0, 
> top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, 
> size=1024], 0448095b-02d8-470c-ab90-6a5bcf82=GridDhtPartitionMap 
> [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], 
> updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], 
> partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], 
> partsToReload=IgniteDhtPartitionsToReloadMap [], 
> topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, 
> resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage 
> [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, 
> minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], 
> lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, 
> dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, 
> lastAffChangedTopVer=null, err=null, skipPrepare=false]]], 
> exchangeNeeded=false, stopProc=false]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6423)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:6243)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3260)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8058)
>  ~[classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089)
>  [classes/:?]
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) 
> [classes/:?]
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7989)
>  [classes/:?]
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) 
> [classes/:?]
>   Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize 
> object: CacheAffinityChangeMessage 
> 

[jira] [Assigned] (IGNITE-17498) Update HeapLockManager in order to support Intention locks

2022-08-19 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-17498:
--

Assignee: Sergey Uttsel  (was: Denis Chudov)

> Update HeapLockManager in order to support Intention locks
> --
>
> Key: IGNITE-17498
> URL: https://issues.apache.org/jira/browse/IGNITE-17498
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3, transaction3_rw
>
> It's required to implement new lock upgrade logic that will consider not only 
> S and X locks but also IS, IX and SIX.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-17457) Cluster locks after the transaction recovery procedure if the tx primary node fail

2022-08-19 Thread Sergey Korotkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Korotkov updated IGNITE-17457:
-
Ignite Flags: Release Notes Required

> Cluster locks after the transaction recovery procedure if the tx primary node 
> fail
> --
>
> Key: IGNITE-17457
> URL: https://issues.apache.org/jira/browse/IGNITE-17457
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Korotkov
>Assignee: Sergey Korotkov
>Priority: Major
> Fix For: 2.14
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Ignite cluster may be locked (all client operations would block) after the tx 
> recovery procedure executed on the tx near & primary node failure.
> The prepared transaction may remain un-commited on the backup node after the 
> tx recovery.  So the partition exchange wouldn't complete. So cluster would 
> be locked.
> 
> The Immediate reason is the race condition in the method:
> {code:java}
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code}
> If 2 or more backups are configured It may be called concurrently for the 
> same transaction both from the recovery procedure:
> {code:java}
> IgniteTxManager::commitIfPrepared{code}
> and from the tx recovery request handler:
> {code:java}
> IgniteTxHandler::processCheckPreparedTxRequest{code}
> Problem occur if thread context is switched between old finalization status 
> request and status update.
> 
> The problematic sequence of events is as follows (the lock will be in the 
> node1):
> 1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups.
> 2. On node2 start and prepare transaction choosing key with primary partition 
> stored on node2.
> 3. Kill node2
> 4. The tx recovery procedure is started both on node0 and node1
> 5. In scope of the recovery procedure node0 sends tx recovery request to node1
> 6. The following steps are executed on the node1 in two threads ("procedure" 
> which is a system pool thread executing the tx recovery procedure and 
> "handler" which is a striped pool thread processing the tx recovery request 
> sent from node0):
>  - tx.finalization == NONE
>  - "procedure": calls markFinalizing(RECOVERY_FINISH)
>  - "handler": calls markFinalizing(RECOVERY_FINISH)
>  - "procedure": gets old tx.finlalization - it's NONE
>  - "handler": gets old tx.finalization - it's NONE
>  - "handler": updates tx.finalization - now it's RECOVERY_FINISH
>  - "procedure": tries to update tx.finalization via compareAndSet and fails 
> since compare fails.
>  - "procedure": stops transaction processing and does not try to commit it.
>  - Transaction remains not finished on node1.
> 
> Reproducer is in the pull request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-17505) Document CREATE INDEX command

2022-08-19 Thread Igor Gusev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Gusev reassigned IGNITE-17505:
---

Assignee: Igor Gusev

> Document CREATE INDEX command
> -
>
> Key: IGNITE-17505
> URL: https://issues.apache.org/jira/browse/IGNITE-17505
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Igor Gusev
>Assignee: Igor Gusev
>Priority: Major
>  Labels: ignite-3
>
> In the https://issues.apache.org/jira/browse/IGNITE-17429 ticket, a new 
> CREATE INDEX command was added. We need to document it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-17457) Cluster locks after the transaction recovery procedure if the tx primary node fail

2022-08-19 Thread Sergey Korotkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Korotkov updated IGNITE-17457:
-
Ignite Flags:   (was: Release Notes Required)

> Cluster locks after the transaction recovery procedure if the tx primary node 
> fail
> --
>
> Key: IGNITE-17457
> URL: https://issues.apache.org/jira/browse/IGNITE-17457
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Korotkov
>Assignee: Sergey Korotkov
>Priority: Major
> Fix For: 2.14
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Ignite cluster may be locked (all client operations would block) after the tx 
> recovery procedure executed on the tx near & primary node failure.
> The prepared transaction may remain un-commited on the backup node after the 
> tx recovery.  So the partition exchange wouldn't complete. So cluster would 
> be locked.
> 
> The Immediate reason is the race condition in the method:
> {code:java}
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code}
> If 2 or more backups are configured It may be called concurrently for the 
> same transaction both from the recovery procedure:
> {code:java}
> IgniteTxManager::commitIfPrepared{code}
> and from the tx recovery request handler:
> {code:java}
> IgniteTxHandler::processCheckPreparedTxRequest{code}
> Problem occur if thread context is switched between old finalization status 
> request and status update.
> 
> The problematic sequence of events is as follows (the lock will be in the 
> node1):
> 1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups.
> 2. On node2 start and prepare transaction choosing key with primary partition 
> stored on node2.
> 3. Kill node2
> 4. The tx recovery procedure is started both on node0 and node1
> 5. In scope of the recovery procedure node0 sends tx recovery request to node1
> 6. The following steps are executed on the node1 in two threads ("procedure" 
> which is a system pool thread executing the tx recovery procedure and 
> "handler" which is a striped pool thread processing the tx recovery request 
> sent from node0):
>  - tx.finalization == NONE
>  - "procedure": calls markFinalizing(RECOVERY_FINISH)
>  - "handler": calls markFinalizing(RECOVERY_FINISH)
>  - "procedure": gets old tx.finlalization - it's NONE
>  - "handler": gets old tx.finalization - it's NONE
>  - "handler": updates tx.finalization - now it's RECOVERY_FINISH
>  - "procedure": tries to update tx.finalization via compareAndSet and fails 
> since compare fails.
>  - "procedure": stops transaction processing and does not try to commit it.
>  - Transaction remains not finished on node1.
> 
> Reproducer is in the pull request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-17457) Cluster locks after the transaction recovery procedure if the tx primary node fail

2022-08-19 Thread Sergey Korotkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Korotkov updated IGNITE-17457:
-
Release Note: Fixed potential deadlock in transactions recovery on node 
failure.

> Cluster locks after the transaction recovery procedure if the tx primary node 
> fail
> --
>
> Key: IGNITE-17457
> URL: https://issues.apache.org/jira/browse/IGNITE-17457
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Korotkov
>Assignee: Sergey Korotkov
>Priority: Major
> Fix For: 2.14
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Ignite cluster may be locked (all client operations would block) after the tx 
> recovery procedure executed on the tx near & primary node failure.
> The prepared transaction may remain un-commited on the backup node after the 
> tx recovery.  So the partition exchange wouldn't complete. So cluster would 
> be locked.
> 
> The Immediate reason is the race condition in the method:
> {code:java}
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter::markFinalizing(RECOVERY_FINISH){code}
> If 2 or more backups are configured It may be called concurrently for the 
> same transaction both from the recovery procedure:
> {code:java}
> IgniteTxManager::commitIfPrepared{code}
> and from the tx recovery request handler:
> {code:java}
> IgniteTxHandler::processCheckPreparedTxRequest{code}
> Problem occur if thread context is switched between old finalization status 
> request and status update.
> 
> The problematic sequence of events is as follows (the lock will be in the 
> node1):
> 1. Start cluster with 3 nodes (node0, node1, node2) and cache with 2 backups.
> 2. On node2 start and prepare transaction choosing key with primary partition 
> stored on node2.
> 3. Kill node2
> 4. The tx recovery procedure is started both on node0 and node1
> 5. In scope of the recovery procedure node0 sends tx recovery request to node1
> 6. The following steps are executed on the node1 in two threads ("procedure" 
> which is a system pool thread executing the tx recovery procedure and 
> "handler" which is a striped pool thread processing the tx recovery request 
> sent from node0):
>  - tx.finalization == NONE
>  - "procedure": calls markFinalizing(RECOVERY_FINISH)
>  - "handler": calls markFinalizing(RECOVERY_FINISH)
>  - "procedure": gets old tx.finlalization - it's NONE
>  - "handler": gets old tx.finalization - it's NONE
>  - "handler": updates tx.finalization - now it's RECOVERY_FINISH
>  - "procedure": tries to update tx.finalization via compareAndSet and fails 
> since compare fails.
>  - "procedure": stops transaction processing and does not try to commit it.
>  - Transaction remains not finished on node1.
> 
> Reproducer is in the pull request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-17354) Metrics framework

2022-08-19 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-17354:
-
Fix Version/s: 3.0.0-alpha6

> Metrics framework 
> --
>
> Key: IGNITE-17354
> URL: https://issues.apache.org/jira/browse/IGNITE-17354
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Denis Chudov
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-alpha6
>
>
> *Metrics types*
> Metrics framework should provide the following metrics types:
> - Gauge - is an instantaneous measurement of a value provided by some 
> existing component. Gauge should support primitive types: int, long, double
> - Metric - is just a wrapper on a numeric value which could be increased or 
> decreased to some value. Metric should support primitive types: int, long, 
> double.
> - Hit Rate - accumulates approximate hit rate statistics based on hits in the 
> last time interval.
> - Distribution - distributes values by buckets where each bucket represent 
> some numeric interval (Histogram in AI 2). Internal type - primitive long 
> (should be enough).
> *Concurrency characteristics*
> For scalar numeric metrics it is enough to have atomic number (e.g. 
> AtomicInteger) and striped number (e.g. LongAdder). Such approaches affects 
> memory footprint and performance differently.
> *Design*
> Metrics should have the same life cycle as well as component that produces 
> these metrics. So metrics related to some particular component should be tied 
> together in MetricsSet. the only purpose of metrics set is provide access to 
> metrics values from exporters. Metrics instances itself placed in 
> MetricsSource - an entity which keeps instances of metrics and provides 
> access to the metrics through an interface that is specific for each metrics 
> source. A component that produces metrics must control metrics source life 
> cycle (create it and register in metrics registry, see below).
> All metrics sources (it is not important, enabled or disabled metrics for 
> particular metrics source) must be registered in metrics registry on 
> component start and removed on component stop.
> MetricsSource itself produces an instance of MetricsSet which should be 
> registered in metrics registry on event "metrics enabled" and unregistered on 
> event "metrics disabled".
> Metrics registry provide an access to all metrics sets from exporters side.
> It is possible that metrics registry is overloaded by functionality (manage 
> by metrics sources and metrics sets), so, probably, special component is need 
> for these purposes (e.g. metrics manager).
> Each instance of metric has a name (local in some metric set) and 
> description. So the full metric name it is a concatenation of metrics source 
> name and metric name separated by dot.
> For composite metrics like distribution we should treat each metrics inside 
> (e.g. each range) as separate metric. So the full name for each internal 
> metric will be metrics source + dot + metric instance name + dot + range as 
> string (e.g. 0_100).
> Metrics set must be immutable in order to meet the requirements described in 
> the epic.
> Data structure (likely map) that is responsible for keeping enabled metrics 
> set should be modified using copy-on-write semantics in order to avoid data 
> races between main functionality (metrics enabling\disabling) and exporters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-17452) [Extensions] Implement Kafka to thin client CDC streamer

2022-08-19 Thread Amelchev Nikita (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amelchev Nikita reassigned IGNITE-17452:


Assignee: Amelchev Nikita

> [Extensions] Implement Kafka to thin client CDC streamer
> 
>
> Key: IGNITE-17452
> URL: https://issues.apache.org/jira/browse/IGNITE-17452
> Project: Ignite
>  Issue Type: Task
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: IEP-59, ise
>
> Implement Kafka to thin client CDC streamer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)