[ 
https://issues.apache.org/jira/browse/IGNITE-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-21619:
--------------------------------------------

    Assignee: Alexander Lapin

> "Failed to get the primary replica" after massive data insert and node restart
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-21619
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21619
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql
>    Affects Versions: 3.0.0-beta2
>            Reporter: Andrey Khitrin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3, sql
>         Attachments: ignite-config.conf, ignite3db-0.log
>
>
> Steps to reproduce:
> 1. Start a 1-node cluster.
> 2 Create several tables (5, for example) in aipersist zone.
> 3. Fill these tables with some data (1000 rows each, for example).
> 4. Verify that data is accessible via SQL.
> 5. Restart a node.
> 6. Try to fetch the same data again.
> Expected result: we could fetch data.
> Actual result: data is inaccessible.
> Trace on the client side:
> {code}
> java.sql.SQLException: Failed to get the primary replica 
> [tablePartitionId=6_part_1]
>       at 
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
>       at 
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
>       at 
> org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111)
>        ...
> {code}
> Trace in node log (attached):
> {code}
> 2024-02-28 12:36:34:807 +0500 
> [INFO][%ClusterFailoverTest_cluster_0%sql-execution-pool-0][JdbcQueryEventHandlerImpl]
>  Exception while executing query [query=select sum(k1) from failoverTest00]
> org.apache.ignite.sql.SqlException: IGN-CMN-65535 
> TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary 
> replica [tablePartitionId=6_part_1]
>       at 
> org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:61)
>       at 
> org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:180)
>       at 
> org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.handleError(AsyncSqlCursorImpl.java:157)
>       at 
> org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$2(AsyncSqlCursorImpl.java:96)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>       at 
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$18(ExecutionServiceImpl.java:864)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>       at 
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>       at 
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:83)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 
> TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary 
> replica [tablePartitionId=6_part_1]
>       at 
> org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:117)
>       at 
> org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:51)
>       ... 15 more
> Caused by: org.apache.ignite.internal.lang.IgniteInternalException: 
> IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to 
> get the primary replica [tablePartitionId=6_part_1]
>       at 
> org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$1(ExceptionUtils.java:384)
>       at 
> org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:446)
>       at 
> org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:384)
>       at 
> org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$primaryReplicas$2(SqlQueryProcessor.java:402)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>       at 
> java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>       ... 3 more
> Caused by: java.util.concurrent.CompletionException: 
> org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException:
>  IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The 
> primary replica await timed out [replicationGroupId=6_part_1, 
> referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500, 
> logical=0, composite=112007955400622080], currentLease=Lease 
> [leaseholder=ClusterFailoverTest_cluster_0, 
> leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false, 
> startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500, 
> logical=115, composite=112007955352649843], expirationTime=HybridTimestamp 
> [physical=2024-02-28 12:38:04:048 +0500, logical=0, 
> composite=112007963216969728], prolongable=false, 
> replicationGroupId=6_part_1]]
>       at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
>       ... 9 more
> Caused by: 
> org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException:
>  IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The 
> primary replica await timed out [replicationGroupId=6_part_1, 
> referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500, 
> logical=0, composite=112007955400622080], currentLease=Lease 
> [leaseholder=ClusterFailoverTest_cluster_0, 
> leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false, 
> startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500, 
> logical=115, composite=112007955352649843], expirationTime=HybridTimestamp 
> [physical=2024-02-28 12:38:04:048 +0500, logical=0, 
> composite=112007963216969728], prolongable=false, 
> replicationGroupId=6_part_1]]
>       at 
> org.apache.ignite.internal.placementdriver.leases.LeaseTracker.lambda$awaitPrimaryReplica$5(LeaseTracker.java:276)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
>       ... 10 more
> Caused by: java.util.concurrent.TimeoutException
>       ... 7 more
> {code}
> Issue is *not* reproducible in the following configurations:
> * aipersist with 2 nodes
> * rocksdb with 1 or 2 nodes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to