[ 
https://issues.apache.org/jira/browse/IGNITE-20410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy reassigned IGNITE-20410:
------------------------------------------

    Assignee: Roman Puchkovskiy

> ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication 
> fails with ReplicationTimeoutException
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-20410
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20410
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication 
> fails with ReplicationTimeoutException because of Metadata unavailability on 
> replication node.
> With some extra debug output I got following result:
> {code:java}
> applyUpdateCommand
> applyCmdWithExceptionHandling messageType = 43, groupType = 9
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> ...
> ...
> ...
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> applyCmdWithExceptionHandling RuntimeException
> Replication is timed out [replicaGrpId=1_part_0]
> org.apache.ignite.tx.TransactionException: IGN-REP-3 
> TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out 
> [replicaGrpId=1_part_0]
>       at 
> app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378)
>       at 
> app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461)
>       at 
> app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378)
>       at 
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630)
>       at 
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>       at 
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>       at 
> app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
>       at 
> java.base@18.0.2/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
>       at 
> java.base@18.0.2/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
>       at 
> java.base@18.0.2/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
>       at 
> java.base@18.0.2/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
>       at 
> java.base@18.0.2/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
>       at 
> java.base@18.0.2/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Caused by: 
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
> IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed 
> out [replicaGrpId=1_part_0]
>       ... 9 more {code}
> The problem occurs when test invariant is broken, meaning that despite the
> {code:java}
> transferLeadershipsTo(notInhibitedNodeIndex);{code}
> the leader re-elected to {{{}nodeToInhibitMetaStorage{}}}, meaning that we do 
> expect node_0 (not inhibited) as leader, however the leader is node_1 
> (inhibited).
> {code:java}
> Node to inhibit issart_lsppdr_1
> Inhibit start 1694674876362
> Before put
> !!! Before wait
> !!! After wait
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3, available level 2; rejecting 
> ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3, available level 2; rejecting 
> ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = 
> null{code}
> Please pay attention to the *node = issart_lsppdr_1* in complete resp = 
> ErrorResponseImpl [errorCode=1009 it's the same as
> Node to inhibit issart_lsppdr_1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to