[jira] [Assigned] (IGNITE-21381) ActiveActorTest#testChangeLeaderForce has problems with resource cleanup

2024-03-11 Thread Vladislav Pyatkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-21381:
--

Assignee: Vladislav Pyatkov

> ActiveActorTest#testChangeLeaderForce has problems with resource cleanup
> 
>
> Key: IGNITE-21381
> URL: https://issues.apache.org/jira/browse/IGNITE-21381
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: screenshot-1.png, screenshot-2.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{ActiveActorTest#testChangeLeaderForce}} is started to be flaky on TC with 
> {noformat}
> [05:19:12]F:   
> [org.apache.ignite.internal.placementdriver.ActiveActorTest.testChangeLeaderForce(TestInfo)]
>  org.opentest4j.AssertionFailedError: expected:  but was: 
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
>   at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
>   at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
>   at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:180)
>   at 
> app//org.apache.ignite.internal.placementdriver.ActiveActorTest.testChangeLeaderForce(ActiveActorTest.java:370)
> {noformat}
> From the log we can see that transfer leadership, which was supposed to be 
> successful, do not happen. Behaviour is the following:
> 1) Current leader is {{Leader: ClusterNodeImpl 
> [id=e99210fb-f872-4e08-a99c-53f9512da20e, name=aat_tclf_1235}}
> 2) We want to transfer leadership to {{Peer to transfer leader: Peer 
> [consistentId=aat_tclf_1234, idx=0]}}
> 3) Process of transfer is started
> 4) We receive warn about error during {{GetLeaderRequestImpl}}:
> {noformat}
> [2024-01-29T05:19:08,855][WARN 
> ][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error 
> during the request occurred (will be retried on the randomly selected node) 
> [request=GetLeaderRequestImpl [groupId=TestReplicationGroup, 
> peerId=aat_tclf_1235], peer=Peer [consistentId=aat_tclf_1235, idx=0], 
> newPeer=Peer [consistentId=aat_tclf_1234, idx=0]].
> java.util.concurrent.CompletionException: 
> java.util.concurrent.TimeoutException
>   at 
> java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
>  ~[?:?]
>   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  [?:?]
>   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  [?:?]
>   at 
> java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
>  [?:?]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: java.util.concurrent.TimeoutException
>   ... 7 more
> {noformat}
> 5) After that we see that node {{aat_tclf_1236}} sends invalid 
> {{RequestVoteResponse}} because it thinks that it is the leader:
> {noformat}
> [2024-01-29T05:19:11,370][WARN 
> ][%aat_tclf_1234%JRaft-Response-Processor-15][NodeImpl] Node 
>  received invalid RequestVoteResponse 
> from aat_tclf_1236, state not in STATE_CANDIDATE but STATE_LEADER.
> {noformat}
>  
> Tests {{ActiveActorTest#testChangeLeaderForce}} and 
> {{TopologyAwareRaftGroupServiceTest#testChangeLeaderForce}} were muted.
> Also there are some other problems with this tests, they incorrectly clean up 
> resources in case of failure. Cluster is stopped in test itself, meaning that 
> if some assertion is failed, the rest part of the test won't be evaluated, 
> hence cluster won't be stopped.
> The next problem is that if we run this test a several times, even if they 
> pass successfully, we can see that at some point new test cannot be run 
> because of 
> {noformat}
>  java.lang.OutOfMemoryError: unable 

[jira] [Created] (IGNITE-21733) Improve validation of SchemaDescriptor

2024-03-11 Thread Konstantin Orlov (Jira)
Konstantin Orlov created IGNITE-21733:
-

 Summary: Improve validation of SchemaDescriptor
 Key: IGNITE-21733
 URL: https://issues.apache.org/jira/browse/IGNITE-21733
 Project: Ignite
  Issue Type: Improvement
Reporter: Konstantin Orlov


Currently, {{SchemaDescriptor}} doesn't have validation apart of bunch of 
assertions in constructor. Let's revise this approach and introduce decent 
validation to improve integrity of the system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20384) Clean up abandoned resources for destroyed tables in catalog

2024-03-11 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-20384:
--
Fix Version/s: 3.0.0-beta2

> Clean up abandoned resources for destroyed tables in catalog
> 
>
> Key: IGNITE-20384
> URL: https://issues.apache.org/jira/browse/IGNITE-20384
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Andrey Mashenkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> We need to clean up abandoned resources (from vault and metastore) for 
> destroyed tables from the catalog.
> Perhaps it will be two separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20287) Clean up abandoned resources for destroyed zones in catalog

2024-03-11 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-20287:
--
Fix Version/s: 3.0.0-beta2

> Clean up abandoned resources for destroyed zones in catalog
> ---
>
> Key: IGNITE-20287
> URL: https://issues.apache.org/jira/browse/IGNITE-20287
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> h3. *Motivation*
> We need to clean up resources for destroyed distribution zones from the 
> catalog. It is possible that while a zone is removed, some actions that must 
> be done on a zone deletion could be interrupted by restart. On recovery, we 
> must detect such zone's deletion an must clean up the resources. Currently we 
> store some zone's state in Meta Storage, so this resources must be cleaned up.
> h3. *Definition of done*
> Resources for deleted zones are removed as well for a deleted zone even if 
> this removal were interrupted by restart.
> h3. *Implementation notes*
> For zones that are not presented in the catalog, but presented in the MS, 
> just remove all data nodes related keys. All these changes must be done using 
> meta storage condition which we us when we call 
> {{DistributionZoneManager#removeTriggerKeysAndDataNodes}} on a zone drop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21732) Sql. Split TableRowConverterImpl on two different implementations

2024-03-11 Thread Konstantin Orlov (Jira)
Konstantin Orlov created IGNITE-21732:
-

 Summary: Sql. Split TableRowConverterImpl on two different 
implementations
 Key: IGNITE-21732
 URL: https://issues.apache.org/jira/browse/IGNITE-21732
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Konstantin Orlov


Currently, {{TableRowConverterImpl}} implements two strategies of conversion: 
with and without field trimming. To make code simper and to remove branching 
from a hot path let's split this class on two (one per each strategy)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-18258) Exception on cast to decimal and numeric in SQL

2024-03-11 Thread Maksim Zhuravkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Zhuravkov reassigned IGNITE-18258:
-

Assignee: Maksim Zhuravkov

> Exception on cast to decimal and numeric in SQL
> ---
>
> Key: IGNITE-18258
> URL: https://issues.apache.org/jira/browse/IGNITE-18258
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Reporter: Pavel Tupitsyn
>Assignee: Maksim Zhuravkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> *Query*
> {code}
> select (cast(_T0.KEY as decimal) / ?), cast(_T0.KEY as numeric) from 
> PUBLIC.TBL_INT32 as _T0
> {code}
> *Result*
> {code}
> org.apache.ignite.lang.IgniteException: IGN-CMN-65535 
> TraceId:9b69e26a-0d1e-4891-82bb-f164919a323c For conversion to decimal, 
> ConverterUtils#convertToDecimal method should be used instead.
>   at org.apache.ignite.lang.IgniteException.wrap(IgniteException.java:289)
>   at 
> org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$0(AsyncSqlCursorImpl.java:77)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
>   at 
> java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
>   at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>   at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>   at 
> org.apache.ignite.internal.sql.engine.exec.rel.AsyncRootNode.lambda$closeAsync$0(AsyncRootNode.java:193)
>   at 
> java.base/java.util.concurrent.LinkedBlockingQueue.forEachFrom(LinkedBlockingQueue.java:1010)
>   at 
> java.base/java.util.concurrent.LinkedBlockingQueue.forEach(LinkedBlockingQueue.java:979)
>   at 
> org.apache.ignite.internal.sql.engine.exec.rel.AsyncRootNode.closeAsync(AsyncRootNode.java:193)
>   at 
> org.apache.ignite.internal.sql.engine.exec.rel.AsyncRootNode.onError(AsyncRootNode.java:148)
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$acknowledgeFragment$1(ExecutionServiceImpl.java:453)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731)
>   at 
> java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108)
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.acknowledgeFragment(ExecutionServiceImpl.java:452)
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.onMessage(ExecutionServiceImpl.java:310)
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.lambda$start$3(ExecutionServiceImpl.java:183)
>   at 
> org.apache.ignite.internal.sql.engine.message.MessageServiceImpl.onMessageInternal(MessageServiceImpl.java:164)
>   at 
> org.apache.ignite.internal.sql.engine.message.MessageServiceImpl.lambda$onMessage$1(MessageServiceImpl.java:135)
>   at 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:80)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.AssertionError: For conversion to decimal, 
> ConverterUtils#convertToDecimal method should be used instead.
>   at 
> org.apache.ignite.internal.sql.engine.exec.exp.ConverterUtils.convert(ConverterUtils.java:222)
>   at 
> org.apache.ignite.internal.sql.engine.exec.exp.ConverterUtils.convert(ConverterUtils.java:201)
>   at 
> org.apache.ignite.internal.sql.engine.exec.exp.RexToLixTranslator.visitDynamicParam(RexToLixTranslator.java:1249)
>   at 
> org.apache.ignite.internal.sql.engine.exec.exp.RexToLixTranslator.visitDynamicParam(RexToLixTranslator.java:80)
>   at 
> org.apache.calcite.rex.RexDynamicParam.accept(RexDynamicParam.java:60)
>   at 
> org.apache.ignite.internal.sql.engine.exec.exp.RexToLixTranslator.visitLocalRef(RexToLixTranslator.java:983)
>   at 
> org.apache.ignite.internal.sql.engine.exec.exp.RexToLixTranslator.visitLocalRef(RexToLixTranslator.java:80)
>   at org.apache.calcite.rex.RexLocalRef.accept(RexLocalRef.java:77)
>   at 
> org.apache.ignite.internal.sql.engine.exec.exp.RexToLixTranslator.implementCallOperand(RexToLixTranslator.java:1106)
>   at 
> 

[jira] [Created] (IGNITE-21731) Sql. Split TableRowConverter#toBinaryRow on two methods

2024-03-11 Thread Konstantin Orlov (Jira)
Konstantin Orlov created IGNITE-21731:
-

 Summary: Sql. Split TableRowConverter#toBinaryRow on two methods
 Key: IGNITE-21731
 URL: https://issues.apache.org/jira/browse/IGNITE-21731
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Konstantin Orlov


Currently, method 
{{org.apache.ignite.internal.sql.engine.exec.TableRowConverter#toBinaryRow}} 
accepts boolean flag {{key}} and creates row with regard to this flag.  
Perhaps, the api will be cleaner if we split this method on two parts: 
{{toKeyRow}} and {{toFullRow}}. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21730) Start tables aside of Metastorage event

2024-03-11 Thread Andrey Mashenkov (Jira)
Andrey Mashenkov created IGNITE-21730:
-

 Summary: Start tables aside of Metastorage event
 Key: IGNITE-21730
 URL: https://issues.apache.org/jira/browse/IGNITE-21730
 Project: Ignite
  Issue Type: Improvement
Reporter: Andrey Mashenkov


As for now, we start raft groups and storages asynchronously, but as a part of 
event handling flow. Thus the metastorage watcher can't proceed with the next 
event until the current one handled (raft groups have been started and storages 
have been created).
This unwanted long operation affects lease updates and leads metastorage errors.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21730) Start tables aside of Metastorage event

2024-03-11 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-21730:
--
Labels: ignite-3 perfomance  (was: ignite-3)

> Start tables aside of Metastorage event
> ---
>
> Key: IGNITE-21730
> URL: https://issues.apache.org/jira/browse/IGNITE-21730
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Andrey Mashenkov
>Priority: Major
>  Labels: ignite-3, perfomance
> Fix For: 3.0
>
>
> As for now, we start raft groups and storages asynchronously, but as a part 
> of event handling flow. Thus the metastorage watcher can't proceed with the 
> next event until the current one handled (raft groups have been started and 
> storages have been created).
> This unwanted long operation affects lease updates and leads metastorage 
> errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21730) Start tables aside of Metastorage event

2024-03-11 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-21730:
--
Fix Version/s: 3.0

> Start tables aside of Metastorage event
> ---
>
> Key: IGNITE-21730
> URL: https://issues.apache.org/jira/browse/IGNITE-21730
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Andrey Mashenkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0
>
>
> As for now, we start raft groups and storages asynchronously, but as a part 
> of event handling flow. Thus the metastorage watcher can't proceed with the 
> next event until the current one handled (raft groups have been started and 
> storages have been created).
> This unwanted long operation affects lease updates and leads metastorage 
> errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20384) Clean up abandoned resources for destroyed tables in catalog

2024-03-11 Thread Andrey Mashenkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov reassigned IGNITE-20384:
-

Assignee: Andrey Mashenkov

> Clean up abandoned resources for destroyed tables in catalog
> 
>
> Key: IGNITE-20384
> URL: https://issues.apache.org/jira/browse/IGNITE-20384
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Andrey Mashenkov
>Priority: Major
>  Labels: ignite-3
>
> We need to clean up abandoned resources (from vault and metastore) for 
> destroyed tables from the catalog.
> Perhaps it will be two separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21729) Prevent threads from being hijacked via async cursors in KV/Record view APIs

2024-03-11 Thread Roman Puchkovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-21729:
---
Description: query() methods return AsyncCursors. AsyncCursor has methods 
returning CompletableFutures. We need to prevent thread hijacking via these 
futures.

> Prevent threads from being hijacked via async cursors in KV/Record view APIs
> 
>
> Key: IGNITE-21729
> URL: https://issues.apache.org/jira/browse/IGNITE-21729
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> query() methods return AsyncCursors. AsyncCursor has methods returning 
> CompletableFutures. We need to prevent thread hijacking via these futures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21729) Prevent threads from being hijacked via async cursors in KV/Record view APIs

2024-03-11 Thread Roman Puchkovskiy (Jira)
Roman Puchkovskiy created IGNITE-21729:
--

 Summary: Prevent threads from being hijacked via async cursors in 
KV/Record view APIs
 Key: IGNITE-21729
 URL: https://issues.apache.org/jira/browse/IGNITE-21729
 Project: Ignite
  Issue Type: Improvement
Reporter: Roman Puchkovskiy
Assignee: Roman Puchkovskiy
 Fix For: 3.0.0-beta2






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21728) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Aleksandr Polovtcev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-21728:
-
Fix Version/s: 3.0.0-beta2

> Close cursors synchronously in ExecutionServiceImplTest
> ---
>
> Key: IGNITE-21728
> URL: https://issues.apache.org/jira/browse/IGNITE-21728
> Project: Ignite
>  Issue Type: Bug
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using 
> {{closeAsync}} but in some tests nobody waits for the return future to 
> complete, which may pose race conditions.
> An example that was found in the logs during 
> {{testErrorIsPropagatedToPrefetchCallback}} execution:
> {noformat}
> [2024-03-11T15:24:02,481][INFO 
> ][%node_1%sql-execution-pool-0][ExecutionServiceImpl] Unable to send error 
> message
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$1368/0x000800820c40@4ff7a879
>  rejected from java.util.concurrent.ThreadPoolExecutor@145a1f2a[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
>  ~[?:?]
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
>  ~[?:?]
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
>  ~[?:?]
>   at 
> org.apache.ignite.internal.thread.AbstractStripedThreadPoolExecutor.execute(AbstractStripedThreadPoolExecutor.java:61)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:82)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:104)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$2(ExecutionServiceImplTest.java:1088)
>  ~[test/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.onReceive(ExecutionServiceImplTest.java:1098)
>  ~[test/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode$1.send(ExecutionServiceImplTest.java:1017)
>  ~[test/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.handleError(ExecutionServiceImpl.java:842)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$submitFragment$11(ExecutionServiceImpl.java:829)
>  ~[main/:?]
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
>  ~[?:?]
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:1004)
>  ~[?:?]
>   at 
> java.base/java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2307)
>  ~[?:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.submitFragment(ExecutionServiceImpl.java:828)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.submitFragment(ExecutionServiceImpl.java:505)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.onMessage(ExecutionServiceImpl.java:404)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.lambda$start$1(ExecutionServiceImpl.java:253)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$0(ExecutionServiceImplTest.java:1086)
>  ~[test/:?]
>   at 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:85)
>  ~[main/:?]
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   at java.base/java.lang.Thread.run(Thread.java:834) [?:?]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21728) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21728:
--
Description: 
{{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using {{closeAsync}} 
but in some tests nobody waits for the return future to complete, which may 
pose race conditions.

An example that was found in the logs during 
{{testErrorIsPropagatedToPrefetchCallback}} execution:
{noformat}
[2024-03-11T15:24:02,481][INFO 
][%node_1%sql-execution-pool-0][ExecutionServiceImpl] Unable to send error 
message
java.util.concurrent.RejectedExecutionException: Task 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$1368/0x000800820c40@4ff7a879
 rejected from java.util.concurrent.ThreadPoolExecutor@145a1f2a[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
 ~[?:?]
at 
org.apache.ignite.internal.thread.AbstractStripedThreadPoolExecutor.execute(AbstractStripedThreadPoolExecutor.java:61)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:82)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:104)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$2(ExecutionServiceImplTest.java:1088)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.onReceive(ExecutionServiceImplTest.java:1098)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode$1.send(ExecutionServiceImplTest.java:1017)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.handleError(ExecutionServiceImpl.java:842)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$submitFragment$11(ExecutionServiceImpl.java:829)
 ~[main/:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:1004)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2307)
 ~[?:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.submitFragment(ExecutionServiceImpl.java:828)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.submitFragment(ExecutionServiceImpl.java:505)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.onMessage(ExecutionServiceImpl.java:404)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.lambda$start$1(ExecutionServiceImpl.java:253)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$0(ExecutionServiceImplTest.java:1086)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:85)
 ~[main/:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 [?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 [?:?]
at java.base/java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}


  was:
{{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using {{closeAsync}} 
but in some tests nobody waits for the return future to complete, which may 
pose race conditions.

An example that was found in the logs:
{noformat}
[2024-03-11T15:24:02,481][INFO 
][%node_1%sql-execution-pool-0][ExecutionServiceImpl] Unable to send error 
message
java.util.concurrent.RejectedExecutionException: Task 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$1368/0x000800820c40@4ff7a879
 rejected from java.util.concurrent.ThreadPoolExecutor@145a1f2a[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
 ~[?:?]
at 

[jira] [Updated] (IGNITE-21728) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21728:
--
Description: 
{{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using {{closeAsync}} 
but in some tests nobody waits for the return future to complete, which may 
pose race conditions.

An example that was found in the logs:
{noformat}
[2024-03-11T15:24:02,481][INFO 
][%node_1%sql-execution-pool-0][ExecutionServiceImpl] Unable to send error 
message
java.util.concurrent.RejectedExecutionException: Task 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$1368/0x000800820c40@4ff7a879
 rejected from java.util.concurrent.ThreadPoolExecutor@145a1f2a[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
 ~[?:?]
at 
org.apache.ignite.internal.thread.AbstractStripedThreadPoolExecutor.execute(AbstractStripedThreadPoolExecutor.java:61)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:82)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:104)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$2(ExecutionServiceImplTest.java:1088)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.onReceive(ExecutionServiceImplTest.java:1098)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode$1.send(ExecutionServiceImplTest.java:1017)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.handleError(ExecutionServiceImpl.java:842)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$submitFragment$11(ExecutionServiceImpl.java:829)
 ~[main/:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:1004)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2307)
 ~[?:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.submitFragment(ExecutionServiceImpl.java:828)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.submitFragment(ExecutionServiceImpl.java:505)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.onMessage(ExecutionServiceImpl.java:404)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.lambda$start$1(ExecutionServiceImpl.java:253)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$0(ExecutionServiceImplTest.java:1086)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:85)
 ~[main/:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 [?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 [?:?]
at java.base/java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}


  was:
{{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using {{closeAsync}} 
but in some tests nobody waits for the return future to complete, which may 
pose race conditions (when the test is stopped).

An example that was found in the logs:
{noformat}
[2024-03-11T15:24:02,481][INFO 
][%node_1%sql-execution-pool-0][ExecutionServiceImpl] Unable to send error 
message
java.util.concurrent.RejectedExecutionException: Task 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$1368/0x000800820c40@4ff7a879
 rejected from java.util.concurrent.ThreadPoolExecutor@145a1f2a[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
 ~[?:?]
at 

[jira] [Updated] (IGNITE-21728) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21728:
--
Description: 
{{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using {{closeAsync}} 
but in some tests nobody waits for the return future to complete, which may 
pose race conditions (when the test is stopped).

An example that was found in the logs:
{noformat}
[2024-03-11T15:24:02,481][INFO 
][%node_1%sql-execution-pool-0][ExecutionServiceImpl] Unable to send error 
message
java.util.concurrent.RejectedExecutionException: Task 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl$$Lambda$1368/0x000800820c40@4ff7a879
 rejected from java.util.concurrent.ThreadPoolExecutor@145a1f2a[Terminated, 
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
 ~[?:?]
at 
org.apache.ignite.internal.thread.AbstractStripedThreadPoolExecutor.execute(AbstractStripedThreadPoolExecutor.java:61)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:82)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.execute(QueryTaskExecutorImpl.java:104)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$2(ExecutionServiceImplTest.java:1088)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.onReceive(ExecutionServiceImplTest.java:1098)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode$1.send(ExecutionServiceImplTest.java:1017)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.handleError(ExecutionServiceImpl.java:842)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$submitFragment$11(ExecutionServiceImpl.java:829)
 ~[main/:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:1004)
 ~[?:?]
at 
java.base/java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2307)
 ~[?:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.submitFragment(ExecutionServiceImpl.java:828)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.submitFragment(ExecutionServiceImpl.java:505)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.onMessage(ExecutionServiceImpl.java:404)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl.lambda$start$1(ExecutionServiceImpl.java:253)
 ~[main/:?]
at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImplTest$TestCluster$TestNode.lambda$onReceive$0(ExecutionServiceImplTest.java:1086)
 ~[test/:?]
at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:85)
 ~[main/:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 [?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 [?:?]
at java.base/java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}


  was:{{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using 
{{closeAsync}} but in some tests nobody waits for the return future to 
complete, which may pose race conditions.


> Close cursors synchronously in ExecutionServiceImplTest
> ---
>
> Key: IGNITE-21728
> URL: https://issues.apache.org/jira/browse/IGNITE-21728
> Project: Ignite
>  Issue Type: Bug
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using 
> {{closeAsync}} but in some tests nobody waits for the return future to 
> complete, which may pose race conditions (when the test is stopped).
> An example that was found in the logs:
> {noformat}
> [2024-03-11T15:24:02,481][INFO 
> ][%node_1%sql-execution-pool-0][ExecutionServiceImpl] 

[jira] [Updated] (IGNITE-21726) Sql. Enable metrics by default

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21726:
--
Description: 
Currently by default all metrics in AI3 are disabled.

Since we believe that all existing metrics in AI3 do not impact performance, 
let's enable them all by default. This will save time on analyzing issues that 
users may encounter.

  was:
Currently by default all metrics are disabled.

Since we believe that all existing metrics in AI3 do not impact performance, 
let's enable them all by default. This will save time on analyzing issues that 
users may encounter.


> Sql. Enable metrics by default
> --
>
> Key: IGNITE-21726
> URL: https://issues.apache.org/jira/browse/IGNITE-21726
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
>
> Currently by default all metrics in AI3 are disabled.
> Since we believe that all existing metrics in AI3 do not impact performance, 
> let's enable them all by default. This will save time on analyzing issues 
> that users may encounter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21726) Sql. Enable metrics by default

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21726:
--
Description: 
Currently by default all metrics are disabled.

Since we believe that all existing metrics in AI3 do not impact performance, 
let's enable them all by default. This will save time on analyzing issues that 
users may encounter.

  was:
Currently by default all metrics are disabled.

Since we believe that all existing metrics in AI3 do not impact performance, 
let's enable them all by default. This will save time on analyzing issues that 
customers may encounter.


> Sql. Enable metrics by default
> --
>
> Key: IGNITE-21726
> URL: https://issues.apache.org/jira/browse/IGNITE-21726
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
>
> Currently by default all metrics are disabled.
> Since we believe that all existing metrics in AI3 do not impact performance, 
> let's enable them all by default. This will save time on analyzing issues 
> that users may encounter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21726) Sql. Enable metrics by default

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21726:
--
Description: 
Currently by default all metrics are disabled.

Since we believe that all existing metrics in AI3 impact performance, let's 
enable them all by default. This will save time on analyzing issues that 
customers may encounter.

  was:
Currently by default all metrics are disabled.
It is recommended to enable all metrics that do not add significant performance 
overhead.

At the moment we are counting. that all existing metrics in AI3 should not 
affect performance, so let's enable them all by default.


> Sql. Enable metrics by default
> --
>
> Key: IGNITE-21726
> URL: https://issues.apache.org/jira/browse/IGNITE-21726
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
>
> Currently by default all metrics are disabled.
> Since we believe that all existing metrics in AI3 impact performance, let's 
> enable them all by default. This will save time on analyzing issues that 
> customers may encounter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21726) Sql. Enable metrics by default

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21726:
--
Description: 
Currently by default all metrics are disabled.

Since we believe that all existing metrics in AI3 do not impact performance, 
let's enable them all by default. This will save time on analyzing issues that 
customers may encounter.

  was:
Currently by default all metrics are disabled.

Since we believe that all existing metrics in AI3 impact performance, let's 
enable them all by default. This will save time on analyzing issues that 
customers may encounter.


> Sql. Enable metrics by default
> --
>
> Key: IGNITE-21726
> URL: https://issues.apache.org/jira/browse/IGNITE-21726
> Project: Ignite
>  Issue Type: Task
>Reporter: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
>
> Currently by default all metrics are disabled.
> Since we believe that all existing metrics in AI3 do not impact performance, 
> let's enable them all by default. This will save time on analyzing issues 
> that customers may encounter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21728) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Aleksandr Polovtcev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-21728:
-
Description: {{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed 
using {{closeAsync}} but in some tests nobody waits for the return future to 
complete, which may pose race conditions.  (was: {{AsyncCursor}} in )

> Close cursors synchronously in ExecutionServiceImplTest
> ---
>
> Key: IGNITE-21728
> URL: https://issues.apache.org/jira/browse/IGNITE-21728
> Project: Ignite
>  Issue Type: Bug
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Major
>  Labels: ignite-3
>
> {{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using 
> {{closeAsync}} but in some tests nobody waits for the return future to 
> complete, which may pose race conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21727) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Aleksandr Polovtcev (Jira)
Aleksandr Polovtcev created IGNITE-21727:


 Summary: Close cursors synchronously in ExecutionServiceImplTest
 Key: IGNITE-21727
 URL: https://issues.apache.org/jira/browse/IGNITE-21727
 Project: Ignite
  Issue Type: Improvement
Reporter: Aleksandr Polovtcev
Assignee: Aleksandr Polovtcev


{{AsyncCursor}} in {{ExecutionServiceImplTest}} is closed using {{closeAsync}}, 
but in some tests nobody waits for the future to complete, which can pose race 
conditions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21728) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Aleksandr Polovtcev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-21728:
-
Description: {{AsyncCursor}} in 

> Close cursors synchronously in ExecutionServiceImplTest
> ---
>
> Key: IGNITE-21728
> URL: https://issues.apache.org/jira/browse/IGNITE-21728
> Project: Ignite
>  Issue Type: Bug
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Major
>  Labels: ignite-3
>
> {{AsyncCursor}} in 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21728) Close cursors synchronously in ExecutionServiceImplTest

2024-03-11 Thread Aleksandr Polovtcev (Jira)
Aleksandr Polovtcev created IGNITE-21728:


 Summary: Close cursors synchronously in ExecutionServiceImplTest
 Key: IGNITE-21728
 URL: https://issues.apache.org/jira/browse/IGNITE-21728
 Project: Ignite
  Issue Type: Bug
Reporter: Aleksandr Polovtcev
Assignee: Aleksandr Polovtcev






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21726) Sql. Enable metrics by default

2024-03-11 Thread Pavel Pereslegin (Jira)
Pavel Pereslegin created IGNITE-21726:
-

 Summary: Sql. Enable metrics by default
 Key: IGNITE-21726
 URL: https://issues.apache.org/jira/browse/IGNITE-21726
 Project: Ignite
  Issue Type: Task
Reporter: Pavel Pereslegin


Currently by default all metrics are disabled.
It is recommended to enable all metrics that do not add significant performance 
overhead.

At the moment we are counting. that all existing metrics in AI3 should not 
affect performance, so let's enable them all by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21718) Extract volatile state in AbstractPageMemoryMvPartitionStorage into a separate class

2024-03-11 Thread Aleksandr Polovtcev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-21718:
-
Fix Version/s: 3.0.0-beta2

> Extract volatile state in AbstractPageMemoryMvPartitionStorage into a 
> separate class
> 
>
> Key: IGNITE-21718
> URL: https://issues.apache.org/jira/browse/IGNITE-21718
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Minor
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{AbstractPageMemoryMvPartitionStorage}} contains a bunch of volatile fields 
> that get replaced during a rebalance cleanup. I propose to wrap this fields 
> in a single class in order to make the code a little bit more maintainable. I 
> see the following benefits:
> # It will become easy to understand, what components of the storage may be 
> updated;
> # It will be easier to add more volatile components and not forget to update 
> them;
> # It will become easier to avoid unnecessary volatile reads, because the 
> whole state can be fetched using a single read.
>  
> The only downside I can see is that the code may become a little bit more 
> verbose, because you will need to access the state class first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21646) Clean FreeList when cleaning BplusTree

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-21646:
-
Fix Version/s: 3.0.0-beta2

> Clean FreeList when cleaning BplusTree
> --
>
> Key: IGNITE-21646
> URL: https://issues.apache.org/jira/browse/IGNITE-21646
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> When implementing 
> org.apache.ignite.internal.pagememory.tree.BplusTree#startGradualDestruction, 
> they forgot to clear pages from FreeLists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21646) Clean FreeList when cleaning BplusTree

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko reassigned IGNITE-21646:


Assignee: Kirill Tkalenko

> Clean FreeList when cleaning BplusTree
> --
>
> Key: IGNITE-21646
> URL: https://issues.apache.org/jira/browse/IGNITE-21646
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
>
> When implementing 
> org.apache.ignite.internal.pagememory.tree.BplusTree#startGradualDestruction, 
> they forgot to clear pages from FreeLists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21646) Clean FreeList when cleaning BplusTree

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko resolved IGNITE-21646.
--
Resolution: Invalid

After checking the code, I saw that the *FreeList* were being cleared, the task 
was not needed.

> Clean FreeList when cleaning BplusTree
> --
>
> Key: IGNITE-21646
> URL: https://issues.apache.org/jira/browse/IGNITE-21646
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
>
> When implementing 
> org.apache.ignite.internal.pagememory.tree.BplusTree#startGradualDestruction, 
> they forgot to clear pages from FreeLists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21703) IgniteSqlFunctions.octetLength relies on default encoding

2024-03-11 Thread Viacheslav Blinov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Blinov updated IGNITE-21703:
---
Component/s: sql

> IgniteSqlFunctions.octetLength relies on default encoding
> -
>
> Key: IGNITE-21703
> URL: https://issues.apache.org/jira/browse/IGNITE-21703
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Reporter: Viacheslav Blinov
>Priority: Minor
>  Labels: ignite3
>
> Issue detected by SpotBugs. Specifically the warning reported is:
> {noformat}
> H I DM_DEFAULT_ENCODING Dm: Found reliance on default encoding in 
> org.apache.ignite.internal.sql.engine.exec.exp.IgniteSqlFunctions.octetLength(String):
>  String.getBytes()  At IgniteSqlFunctions.java:[line 133]
> {noformat}
> It looks like a potential bug if system default encoding will be something 
> exotic.
> Investigate whenever this is a false-positive and we should suppress it, or 
> make a proper fix.
> At the result of investigation corresponding TODO should be removed in 
> spotbugs-excludes.xml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21703) Sql. IgniteSqlFunctions.octetLength relies on default encoding

2024-03-11 Thread Viacheslav Blinov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viacheslav Blinov updated IGNITE-21703:
---
Summary: Sql. IgniteSqlFunctions.octetLength relies on default encoding  
(was: IgniteSqlFunctions.octetLength relies on default encoding)

> Sql. IgniteSqlFunctions.octetLength relies on default encoding
> --
>
> Key: IGNITE-21703
> URL: https://issues.apache.org/jira/browse/IGNITE-21703
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Reporter: Viacheslav Blinov
>Priority: Minor
>  Labels: ignite3
>
> Issue detected by SpotBugs. Specifically the warning reported is:
> {noformat}
> H I DM_DEFAULT_ENCODING Dm: Found reliance on default encoding in 
> org.apache.ignite.internal.sql.engine.exec.exp.IgniteSqlFunctions.octetLength(String):
>  String.getBytes()  At IgniteSqlFunctions.java:[line 133]
> {noformat}
> It looks like a potential bug if system default encoding will be something 
> exotic.
> Investigate whenever this is a false-positive and we should suppress it, or 
> make a proper fix.
> At the result of investigation corresponding TODO should be removed in 
> spotbugs-excludes.xml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21709) Revise TimestampAware messages processing

2024-03-11 Thread Pavel Pereslegin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-21709:
--
Description: 
{{TimestampAware}} messages contain hybrid timestamp to adjust a hybrid logical 
clock.

Currently, ReplicaManager updates the local clock when it receives a 
{{ReplicaRequest}} with a timestamp.

It may be worth revising the design and adding general processing of such 
messages (probably at the {{MessagingService}} level).
For example, it is also necessary to adjust the clock when receiving a 
{{QueryBatchMessage}} (sql-engine) and currently each component must duplicate 
the clock adjusting logic.


  was:
{{TimestampAware}} messages contain timestamp to adjust a hybrid logical clock.

Currently, ReplicaManager updates the local clock when it receives a 
{{ReplicaRequest}} with a timestamp.

It may be worth revising the design and adding general processing of such 
messages (probably at the {{MessagingService}} level).
For example, it is also necessary to adjust the clock when receiving a 
{{QueryBatchMessage}} (sql-engine) and currently each component must duplicate 
the clock adjusting logic.



> Revise TimestampAware messages processing
> -
>
> Key: IGNITE-21709
> URL: https://issues.apache.org/jira/browse/IGNITE-21709
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
>
> {{TimestampAware}} messages contain hybrid timestamp to adjust a hybrid 
> logical clock.
> Currently, ReplicaManager updates the local clock when it receives a 
> {{ReplicaRequest}} with a timestamp.
> It may be worth revising the design and adding general processing of such 
> messages (probably at the {{MessagingService}} level).
> For example, it is also necessary to adjust the clock when receiving a 
> {{QueryBatchMessage}} (sql-engine) and currently each component must 
> duplicate the clock adjusting logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21725) The exception "Primary replica has expired" on creation of 1000 tables

2024-03-11 Thread Igor (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor updated IGNITE-21725:
--
Summary: The exception "Primary replica has expired" on creation of 1000 
tables  (was: The exception "Primary replica has expired" on a lot creation of 
1000 tables)

> The exception "Primary replica has expired" on creation of 1000 tables
> --
>
> Key: IGNITE-21725
> URL: https://issues.apache.org/jira/browse/IGNITE-21725
> Project: Ignite
>  Issue Type: Bug
>  Components: general, persistence
>Affects Versions: 3.0.0-beta1
>Reporter: Igor
>Priority: Major
>  Labels: ignite-3
>
> *Steps to reproduce:*
> 1. Start cluster with 1 node with JVM options: "-Xms4096m -Xmx4096m"
> 2. Create 1000 tables with 200 varchar columns each  and insert 1 row into 
> each. One by one.
> *Expected result:*
> Tables are created.
> *Actual result:*
> On table 949 the exception is thrown:
> {code:java}
> java.sql.SQLException: Primary replica has expired, transaction will be 
> rolled back: [groupId = 1850_part_21, expected enlistment consistency token = 
> 112069202113202526, commit timestamp = HybridTimestamp [physical=2024-03-10 
> 03:13:16:057 +, logical=396, composite=112069207395991948], current 
> primary replica = null]
>   at 
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
>   at 
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
>   at 
> org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeWithArguments(JdbcPreparedStatement.java:765)
>   at 
> org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:173)
>   at 
> org.gridgain.ai3tests.tests.TablesAmountCapacityTest.lambda$insertRowAndAssertTimeout$1(TablesAmountCapacityTest.java:166)
>   at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834) {code}
> In server logs there is an exception:
> {code:java}
> 2024-03-10 03:13:24:222 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-8][TxManagerImpl]
>  Failed to finish Tx. The operation will be retried 
> [txId=018e2659-b09f-009c-23c0-6ab50001].
> java.util.concurrent.CompletionException: 
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
> IGN-REP-3 TraceId:7ff7e851-9f18-4212-b317-a70a0a92fdfe Replication is timed 
> out [replicaGrpId=1850_part_21]
>     at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:704)
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>     at 
> org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$0(ReplicaService.java:110)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: 
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
> IGN-REP-3 TraceId:7ff7e851-9f18-4212-b317-a70a0a92fdfe Replication is timed 
> out [replicaGrpId=1850_part_21]
>     ... 4 more
> 2024-03-10 03:13:24:290 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-22][TrackableNetworkMessageHandler]
>  Message handling has been too long [duration=67ms, message=[class 
> org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
> 2024-03-10 03:13:24:290 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-11][TrackableNetworkMessageHandler]
>  Message handling has been too long [duration=67ms, message=[class 
> org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
> 2024-03-10 03:13:24:290 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-19][TrackableNetworkMessageHandler]
>  Message handling has been too long [duration=67ms, message=[class 
> org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
> 2024-03-10 03:13:24:290 + 
> 

[jira] [Assigned] (IGNITE-21578) ItDurableFinishTest#testWaitForCleanup failed with NPE

2024-03-11 Thread Alexander Lapin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin reassigned IGNITE-21578:


Assignee:  Kirill Sizov  (was: Alexander Lapin)

>  ItDurableFinishTest#testWaitForCleanup failed with NPE
> ---
>
> Key: IGNITE-21578
> URL: https://issues.apache.org/jira/browse/IGNITE-21578
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7870395?expandBuildDeploymentsSection=false=false=false=true+Inspection=true=true]
> {code:java}
>   Caused by: java.lang.NullPointerException
>     at 
> org.apache.ignite.internal.tx.impl.TxManagerImpl.lambda$finishFull$3(TxManagerImpl.java:472)
>  ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
>     at 
> org.apache.ignite.internal.tx.impl.VolatileTxStateMetaStorage.lambda$updateMeta$0(VolatileTxStateMetaStorage.java:73)
>  ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
>     at 
> java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908) 
> ~[?:?]
>     at 
> org.apache.ignite.internal.tx.impl.VolatileTxStateMetaStorage.updateMeta(VolatileTxStateMetaStorage.java:72)
>  ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
>     at 
> org.apache.ignite.internal.tx.impl.TxManagerImpl.updateTxMeta(TxManagerImpl.java:455)
>  ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
>     at 
> org.apache.ignite.internal.tx.impl.TxManagerImpl.finishFull(TxManagerImpl.java:472)
>  ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
>     at 
> org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$13(InternalTableImpl.java:593)
>  ~[ignite-table-3.0.0-SNAPSHOT.jar:?] {code}
> Seems that the reason is that old meta may be null in case of exception
> {code:java}
>     public void finishFull(HybridTimestampTracker timestampTracker, UUID 
> txId, boolean commit) {
>         ...
>         updateTxMeta(txId, old -> new TxStateMeta(finalState, 
> old.txCoordinatorId(), old.commitPartitionId(), old.commitTimestamp()));
>         ...
>     }
> {code}
> {code:java}
>         return fut.handle((BiFunction>) 
> (r, e) -> {
>             if (full) { // Full txn is already finished remotely. Just update 
> local state.
>                 txManager.finishFull(observableTimestampTracker, tx0.id(), e 
> == null);{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21725) The exception "Primary replica has expired" on a lot creation of 1000 tables

2024-03-11 Thread Igor (Jira)
Igor created IGNITE-21725:
-

 Summary: The exception "Primary replica has expired" on a lot 
creation of 1000 tables
 Key: IGNITE-21725
 URL: https://issues.apache.org/jira/browse/IGNITE-21725
 Project: Ignite
  Issue Type: Bug
  Components: general, persistence
Affects Versions: 3.0.0-beta1
Reporter: Igor


*Steps to reproduce:*

1. Start cluster with 1 node with JVM options: "-Xms4096m -Xmx4096m"

2. Create 1000 tables with 200 varchar columns each  and insert 1 row into 
each. One by one.

*Expected result:*
Tables are created.

*Actual result:*

On table 949 the exception is thrown:
{code:java}
java.sql.SQLException: Primary replica has expired, transaction will be rolled 
back: [groupId = 1850_part_21, expected enlistment consistency token = 
112069202113202526, commit timestamp = HybridTimestamp [physical=2024-03-10 
03:13:16:057 +, logical=396, composite=112069207395991948], current primary 
replica = null]
  at 
org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
  at 
org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
  at 
org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeWithArguments(JdbcPreparedStatement.java:765)
  at 
org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:173)
  at 
org.gridgain.ai3tests.tests.TablesAmountCapacityTest.lambda$insertRowAndAssertTimeout$1(TablesAmountCapacityTest.java:166)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  at java.base/java.lang.Thread.run(Thread.java:834) {code}
In server logs there is an exception:
{code:java}
2024-03-10 03:13:24:222 + 
[WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-8][TxManagerImpl]
 Failed to finish Tx. The operation will be retried 
[txId=018e2659-b09f-009c-23c0-6ab50001].
java.util.concurrent.CompletionException: 
org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
IGN-REP-3 TraceId:7ff7e851-9f18-4212-b317-a70a0a92fdfe Replication is timed out 
[replicaGrpId=1850_part_21]
    at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
    at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
    at 
java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:704)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
    at 
org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$0(ReplicaService.java:110)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: 
org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
IGN-REP-3 TraceId:7ff7e851-9f18-4212-b317-a70a0a92fdfe Replication is timed out 
[replicaGrpId=1850_part_21]
    ... 4 more
2024-03-10 03:13:24:290 + 
[WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-22][TrackableNetworkMessageHandler]
 Message handling has been too long [duration=67ms, message=[class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
2024-03-10 03:13:24:290 + 
[WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-11][TrackableNetworkMessageHandler]
 Message handling has been too long [duration=67ms, message=[class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
2024-03-10 03:13:24:290 + 
[WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-19][TrackableNetworkMessageHandler]
 Message handling has been too long [duration=67ms, message=[class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
2024-03-10 03:13:24:290 + 
[WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-17][TrackableNetworkMessageHandler]
 Message handling has been too long [duration=67ms, message=[class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
2024-03-10 03:13:24:290 + 
[WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-23][TrackableNetworkMessageHandler]
 Message handling has been too long [duration=67ms, message=[class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
2024-03-10 03:13:24:290 + 

[jira] [Updated] (IGNITE-21724) Support "-ea" version in ItInitializedClusterRestTest

2024-03-11 Thread Mikhail Pochatkin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pochatkin updated IGNITE-21724:
---
Description: 
Every release we encounter the following problem:
 
{code:java}
"(?\\d+)\\.(?\\d+)\\.(?\\d+)((?-SNAPSHOT)|-(?alpha\\d+)|--(?beta\\d+))?"
 

Apache Ignite ver. 9.0.0-ea5{code}
 

  was:
Every release we encounter the following problem:
 
 {{java.lang.AssertionError:   Expected: a string matching the pattern 
<(?\d+)\.(?\d+)\.(?\d+)((?-SNAPSHOT)|-(?alpha\d+)|--(?beta\d+))?>
   but: the string was "9.0.0-ea5"}}
{{Apache Ignite ver. 9.0.0-ea5}}


> Support "-ea" version in ItInitializedClusterRestTest
> -
>
> Key: IGNITE-21724
> URL: https://issues.apache.org/jira/browse/IGNITE-21724
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Mikhail Pochatkin
>Assignee: Mikhail Pochatkin
>Priority: Major
>  Labels: ignite-3
>
> Every release we encounter the following problem:
>  
> {code:java}
> "(?\\d+)\\.(?\\d+)\\.(?\\d+)((?-SNAPSHOT)|-(?alpha\\d+)|--(?beta\\d+))?"
>  
> Apache Ignite ver. 9.0.0-ea5{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21724) Support "-ea" version in ItInitializedClusterRestTest

2024-03-11 Thread Mikhail Pochatkin (Jira)
Mikhail Pochatkin created IGNITE-21724:
--

 Summary: Support "-ea" version in ItInitializedClusterRestTest
 Key: IGNITE-21724
 URL: https://issues.apache.org/jira/browse/IGNITE-21724
 Project: Ignite
  Issue Type: Improvement
Reporter: Mikhail Pochatkin
Assignee: Mikhail Pochatkin


Every release we encounter the following problem:
 
 {{java.lang.AssertionError:   Expected: a string matching the pattern 
<(?\d+)\.(?\d+)\.(?\d+)((?-SNAPSHOT)|-(?alpha\d+)|--(?beta\d+))?>
   but: the string was "9.0.0-ea5"}}
{{Apache Ignite ver. 9.0.0-ea5}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21501) Create index storages for new partitions on rebalance

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko resolved IGNITE-21501.
--
Resolution: Fixed

> Create index storages for new partitions on rebalance
> -
>
> Key: IGNITE-21501
> URL: https://issues.apache.org/jira/browse/IGNITE-21501
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> It appears that we only create index storages during the "table creation", 
> not during the "partition creation" if it's performed in isolation.
> Even if we did, 
> {{org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler#waitIndexes}}
>  is still badly designed, because it waits for indexes of the initial 
> partitions distribution and cannot provide any guarantees when assignments 
> are changed.
> This leads to NPEs or bizarre assertions, related to aforementioned method.
> What we need to do is:
>  * Get rid of the faulty index awaiting mechanizm.
>  * Create index storages before starting raft group.
>  * [optional] There might be naturally occurring "races" between catalog 
> updates (index creation) and rebalance. Right now they are resolved by the 
> fact that these processes are linearized in watch processing, but that's not 
> the best approach. If we could provide something more robust, that would have 
> been nice. Let's think about it at least.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20133) Compute hashes for integral/decimal columns in a stable way

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-20133:
-
Epic Link: IGNITE-21450  (was: IGNITE-17767)

> Compute hashes for integral/decimal columns in a stable way
> ---
>
> Key: IGNITE-20133
> URL: https://issues.apache.org/jira/browse/IGNITE-20133
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Priority: Minor
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> The idea is to make hash computation for integral and decimal types satisfy 
> the following property: if a column type is changed from an integral to a 
> decimal type, the hashes for values that are already stored remain the same.
> This will allow us to permit chaning type (integral -> decimal and decimal -> 
> longer decimal) of a column that is included in a HASH index.
> A hash that has this property is the following function: 
> hash(val.toString(TRIM_TRAILING_ZEROS)). For instance, for 1 it will be 
> hash("1"), for 1.000 it will also be hash("1"), but for 1.23 it will give 
> hash("1.23").



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20134) Only allow changing type of indexed column when indexed values representation remains the same

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-20134:
-
Epic Link: IGNITE-21450  (was: IGNITE-17767)

> Only allow changing type of indexed column when indexed values representation 
> remains the same
> --
>
> Key: IGNITE-20134
> URL: https://issues.apache.org/jira/browse/IGNITE-20134
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> When an attempt to change type of a column that is included in an index is 
> made, this should only be permitted if the representation of the column 
> values in the index will remain unchanged (and, hence, index rebuild will not 
> be needed).
> The following changes are acceptable:
> * integral->integral (as integral types are represented as varints)
> * integral->decimal (with enough precision) and float->double for SORTED 
> indices where the ordering remains the same
> * integral->decimal and decimal->decimal (with enough precision) for HASH 
> indices (requires IGNITE-20133)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21126) Before starting to backfill an index, only wait for transactions where the index table is enlisted

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-21126:
-
Epic Link: IGNITE-21450  (was: IGNITE-17767)

> Before starting to backfill an index, only wait for transactions where the 
> index table is enlisted
> --
>
> Key: IGNITE-21126
> URL: https://issues.apache.org/jira/browse/IGNITE-21126
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
>
> IGNITE-21115 says that we must wait for all RW transactions started on old 
> schema versions to be finished before initiating an index backfill. This 
> means that a long-running RW transaction that never touches table A will 
> block backfilling of any new index created on table A, which is too 
> restrictive.
> The idea is that we modify the mechanism defined in IGNITE-21112 by also 
> passing tableId in RwTransactionsFinishedRequest; the request handling will 
> only take into account tables where any partition of the table is enlisted.
> This means that an RW transaction that was started before the index 
> appearance will be aborted if trying to write to the index after the index 
> Backfill starts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-17325) Implement a comparator for inlined BinaryTuple in sorted index

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-17325:
-
Epic Link: IGNITE-21450  (was: IGNITE-17767)

> Implement a comparator for inlined BinaryTuple in sorted index
> --
>
> Key: IGNITE-17325
> URL: https://issues.apache.org/jira/browse/IGNITE-17325
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> We need to implement an inlined *BinaryTuple* comparator in a sorted index 
> for a B+tree.
> You need to take into account the format of the *BinaryTuple* and the fact 
> that it can be truncated.
> As a basis, you can take 
> *org.apache.ignite.internal.storage.index.BinaryTupleComparator*.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20139) RandomForestClassifierTrainer accuracy issue

2024-03-11 Thread Igor Belyakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825164#comment-17825164
 ] 

Igor Belyakov commented on IGNITE-20139:


[~zaleslaw], could you please review the PR?

> RandomForestClassifierTrainer accuracy issue
> 
>
> Key: IGNITE-20139
> URL: https://issues.apache.org/jira/browse/IGNITE-20139
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.15
>Reporter: Alexandr Shapkin
>Assignee: Igor Belyakov
>Priority: Major
> Attachments: TreeSample2_Portfolio_Change.png, random-forest.zip
>
>
> We tried to use machine learning capabilities, and discovered a bug in 
> implementation of Random Forest. When comparing Ignite's output with python 
> prototype (scikit-learn lib), we noticed that Ignite's predictions have much 
> lower accuracy despite using the same data set and model parameters. 
> Further investigation showed that Ignite generates decision trees that kinda 
> "loop". The tree starts checking the same condition over and over until it 
> reaches the maximum tree depth.
> I've attached a standalone reproducer which uses a small excerpt of our data 
> set. 
> It loads data from the csv file, then performs the training of the model for 
> just 1 tree. Then the reproducer finds one of the looping branches and prints 
> it. You will see that every single node in the branch uses the same feature, 
> value and has then same calculated impurity. 
> On my machine the code reproduces this issue 100% of time.
> I've also attached an example of the tree generated by python's scikit-learn 
> on the same data set with the same parameters. In python the tree usually 
> doesn't get deeper than 20 nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-19712) Handle rebalance wrt indexes

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko resolved IGNITE-19712.
--
Resolution: Duplicate

> Handle rebalance wrt indexes
> 
>
> Key: IGNITE-19712
> URL: https://issues.apache.org/jira/browse/IGNITE-19712
> Project: Ignite
>  Issue Type: Bug
>Reporter: Semyon Danilov
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> After IGNITE-19363, index storages are no longer lazily instantiated. Need to 
> listen to assignment changes and start new storages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-19712) Handle rebalance wrt indexes

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko reassigned IGNITE-19712:


Assignee: Kirill Tkalenko

> Handle rebalance wrt indexes
> 
>
> Key: IGNITE-19712
> URL: https://issues.apache.org/jira/browse/IGNITE-19712
> Project: Ignite
>  Issue Type: Bug
>Reporter: Semyon Danilov
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> After IGNITE-19363, index storages are no longer lazily instantiated. Need to 
> listen to assignment changes and start new storages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19712) Handle rebalance wrt indexes

2024-03-11 Thread Kirill Tkalenko (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-19712:
-
Fix Version/s: 3.0.0-beta2

> Handle rebalance wrt indexes
> 
>
> Key: IGNITE-19712
> URL: https://issues.apache.org/jira/browse/IGNITE-19712
> Project: Ignite
>  Issue Type: Bug
>Reporter: Semyon Danilov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> After IGNITE-19363, index storages are no longer lazily instantiated. Need to 
> listen to assignment changes and start new storages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20139) RandomForestClassifierTrainer accuracy issue

2024-03-11 Thread Igor Belyakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825163#comment-17825163
 ] 

Igor Belyakov commented on IGNITE-20139:


The issue happens when one “pure“ node (with impurity{^}*{^} = 0) is presented 
in the tree. We calculate an impurity only for children nodes and not for the 
current node, as well as do not check whether the node is “pure“ and contains 
just one label, due to that, the “bestSplit” calculation is executed for the 
already “pure“ node, which decides that all items should be moved to the left 
child node and no items to the right (leaf node), which gives 2 “pure“ children 
nodes. Since we don’t calculate impurity for the current (parent) node the 
{{parentNode.getImpurity() - split.get().getImpurity() > minImpurityDelta}} 
check is always true, and we continue to split the already “pure“ node until 
the max tree depth is reached.
The following changes were made to resolve the issue:
 # Gain{^}**{^} calculation and check for the split were added.
 # Node’s impurity check is added, once the impurity becomes 0 it means that 
the node is “pure” and we don’t need to calculate a split for it.
 # Gini impurity calculation was changed to {{(1 - sum(p^2))}} to get the 
correct values in the range from 0 to 0.5 as required for the Gini index.

^*^ Impurity - is a value from 0 to 0.5, which shows whether the node is “pure“ 
(impurity = 0) having just 1 label or “impure” with impurity=0.5, which is the 
worst scenario where the label ratio is 1:1.
^**^ Gain - is a difference between the parent node’s impurity and weighted 
children nodes' impurity. The split which provides the maximum gain value is 
considered the best. See [https://www.learndatasci.com/glossary/gini-impurity/]

> RandomForestClassifierTrainer accuracy issue
> 
>
> Key: IGNITE-20139
> URL: https://issues.apache.org/jira/browse/IGNITE-20139
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.15
>Reporter: Alexandr Shapkin
>Assignee: Igor Belyakov
>Priority: Major
> Attachments: TreeSample2_Portfolio_Change.png, random-forest.zip
>
>
> We tried to use machine learning capabilities, and discovered a bug in 
> implementation of Random Forest. When comparing Ignite's output with python 
> prototype (scikit-learn lib), we noticed that Ignite's predictions have much 
> lower accuracy despite using the same data set and model parameters. 
> Further investigation showed that Ignite generates decision trees that kinda 
> "loop". The tree starts checking the same condition over and over until it 
> reaches the maximum tree depth.
> I've attached a standalone reproducer which uses a small excerpt of our data 
> set. 
> It loads data from the csv file, then performs the training of the model for 
> just 1 tree. Then the reproducer finds one of the looping branches and prints 
> it. You will see that every single node in the branch uses the same feature, 
> value and has then same calculated impurity. 
> On my machine the code reproduces this issue 100% of time.
> I've also attached an example of the tree generated by python's scikit-learn 
> on the same data set with the same parameters. In python the tree usually 
> doesn't get deeper than 20 nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-18879) Leaseholder candidates balancing

2024-03-11 Thread yexiaowei (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825157#comment-17825157
 ] 

yexiaowei commented on IGNITE-18879:


[~Denis Chudov] I would like to ask a question about Metastorage. I noticed 
that Metastorage relies on a Raft group. However, in many cases when reading 
Metastorage data, it doesn't read from the leader or use raft ReadIndex. Would 
this lead to issues with reading outdated data? For example, directly reading 
lease information PlacementDriver#currentLease from PD.

> Leaseholder candidates balancing
> 
>
> Key: IGNITE-18879
> URL: https://issues.apache.org/jira/browse/IGNITE-18879
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> Primary replicas (leaseholders) should be evenly distributed over cluster to 
> balance the transactional load between nodes. As the placement driver assigns 
> primary replicas, balancing the primary replicas is also it's responsibility. 
> Naive implementation of balancing should choose a node as leaseholder 
> candidate in a way to save even lease distribution over all nodes. In real 
> cluster, it may take into account slow nodes, hot table records, etc. If 
> lease candidate declines LeaseGrantMessage from placement driver, the 
> balancer should make decision to choose another candidate for given primary 
> replica or enforce the previously chosen. So the balancing algorith should be 
> pluggable, so that we could have ability to improve/replace/compare it with 
> others.
> *Definition of done*
> Introduced interface for lease candidates balancer, and a simple 
> implementation sustaining even lease distribution, which is used by placement 
> driver by default. No public or internal configuration needed on this stage.
> *Implementation notes*
> Lease candidates balancer should have at least 2 methods:
>  - {_}get(group, ignoredNodes){_}: returns candidate for the given group, a 
> node from ignoredNodes set can't be chosen as a candidate
>  - {_}considerRedirectProposal(group, candidate, proposedCandidate){_}: 
> processes redirect proposal for given group provided by given candidate 
> (previously chosen using _get_ method), proposedCandidate is the alternative 
> candidate. Returns candidate that should be enforced by placement driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)