[jira] [Updated] (IGNITE-22324) The exception "The primary replica has changed" on creation of 1000 tables

2024-05-24 Thread Igor (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor updated IGNITE-22324:
--
Description: 
*Steps to reproduce:*

1. Start cluster with 1 node with JVM options: "-Xms4096m -Xmx4096m"

2. Create 1000 tables with 200 varchar columns each  and insert 1 row into 
each. One by one.

*Expected result:*
Tables are created.

*Actual result:*

On table 850 the exception is thrown:
{code:java}
java.sql.SQLException: The primary replica has changed 
[expectedLeaseholderName=TablesAmountCapacityTest_cluster_0, 
currentLeaseholderName=null, 
expectedLeaseholderId=bf69f842-d6c8-4f7a-b7e4-96458a4d92cb, 
currentLeaseholderId=null, 
expectedEnlistmentConsistencyToken=112491691050598880, 
currentEnlistmentConsistencyToken=null]  at 
org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
  at 
org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)  
at 
org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeWithArguments(JdbcPreparedStatement.java:765)
  at 
org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:173)
  at 
org.gridgain.ai3tests.tests.amountcapacity.TablesAmountCapacityBaseTest.lambda$insertRowAndAssertTimeout$2(TablesAmountCapacityBaseTest.java:92)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  at java.base/java.lang.Thread.run(Thread.java:834){code}
In server logs there is an exception:
{code:java}
2024-05-23 17:57:19:570 + 
[WARNING][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable 
error during the request occurred (will be retried on the randomly selected 
node) [request=WriteActionRequestImpl [command=[0, 9, 41, -58, -128, -112, -21, 
-103, -45, -23, -57, 1], deserializedCommand=SafeTimeSyncCommandImpl 
[safeTimeLong=112491694408335429], groupId=3402_part_7], peer=Peer 
[consistentId=TablesAmountCapacityTest_cluster_0, idx=0], newPeer=Peer 
[consistentId=TablesAmountCapacityTest_cluster_0, idx=0]].
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at 
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at 
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at 
java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.util.concurrent.TimeoutException
... 7 more
2024-05-23 17:57:19:570 + 
[WARNING][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable 
error during the request occurred (will be retried on the randomly selected 
node) [request=WriteActionRequestImpl [command=[0, 9, 41, -106, -128, -108, 
-21, -103, -45, -23, -57, 1], deserializedCommand=SafeTimeSyncCommandImpl 
[safeTimeLong=112491694408400917], groupId=3402_part_21], peer=Peer 
[consistentId=TablesAmountCapacityTest_cluster_0, idx=0], newPeer=Peer 
[consistentId=TablesAmountCapacityTest_cluster_0, idx=0]].
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at 
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at 
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at 
java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at 
java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 

[jira] [Updated] (IGNITE-22324) The exception "The primary replica has changed" on creation of 1000 tables

2024-05-24 Thread Igor (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor updated IGNITE-22324:
--
Affects Version/s: 3.0.0-beta2
   (was: 3.0.0-beta1)

> The exception "The primary replica has changed" on creation of 1000 tables
> --
>
> Key: IGNITE-22324
> URL: https://issues.apache.org/jira/browse/IGNITE-22324
> Project: Ignite
>  Issue Type: Bug
>  Components: general, persistence
>Affects Versions: 3.0.0-beta2
>Reporter: Igor
>Priority: Major
>  Labels: ignite-3
>
> *Steps to reproduce:*
> 1. Start cluster with 1 node with JVM options: "-Xms4096m -Xmx4096m"
> 2. Create 1000 tables with 200 varchar columns each  and insert 1 row into 
> each. One by one.
> *Expected result:*
> Tables are created.
> *Actual result:*
> On table 949 the exception is thrown:
> {code:java}
> java.sql.SQLException: Primary replica has expired, transaction will be 
> rolled back: [groupId = 1850_part_21, expected enlistment consistency token = 
> 112069202113202526, commit timestamp = HybridTimestamp [physical=2024-03-10 
> 03:13:16:057 +, logical=396, composite=112069207395991948], current 
> primary replica = null]
>   at 
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
>   at 
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
>   at 
> org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeWithArguments(JdbcPreparedStatement.java:765)
>   at 
> org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:173)
>   at 
> org.gridgain.ai3tests.tests.TablesAmountCapacityTest.lambda$insertRowAndAssertTimeout$1(TablesAmountCapacityTest.java:166)
>   at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834) {code}
> In server logs there is an exception:
> {code:java}
> 2024-03-10 03:13:24:222 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-8][TxManagerImpl]
>  Failed to finish Tx. The operation will be retried 
> [txId=018e2659-b09f-009c-23c0-6ab50001].
> java.util.concurrent.CompletionException: 
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
> IGN-REP-3 TraceId:7ff7e851-9f18-4212-b317-a70a0a92fdfe Replication is timed 
> out [replicaGrpId=1850_part_21]
>     at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
>     at 
> java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:704)
>     at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>     at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>     at 
> org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$0(ReplicaService.java:110)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: 
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
> IGN-REP-3 TraceId:7ff7e851-9f18-4212-b317-a70a0a92fdfe Replication is timed 
> out [replicaGrpId=1850_part_21]
>     ... 4 more
> 2024-03-10 03:13:24:290 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-22][TrackableNetworkMessageHandler]
>  Message handling has been too long [duration=67ms, message=[class 
> org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
> 2024-03-10 03:13:24:290 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-11][TrackableNetworkMessageHandler]
>  Message handling has been too long [duration=67ms, message=[class 
> org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
> 2024-03-10 03:13:24:290 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-19][TrackableNetworkMessageHandler]
>  Message handling has been too long [duration=67ms, message=[class 
> org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]]
> 2024-03-10 03:13:24:290 + 
> [WARNING][%TablesAmountCapacityTest_cluster_0%partition-operations-17][TrackableNetworkMessageHandler]
>  Message handling has b