[ https://issues.apache.org/jira/browse/IGNITE-20410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy reassigned IGNITE-20410: ------------------------------------------ Assignee: Roman Puchkovskiy > ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication > fails with ReplicationTimeoutException > ------------------------------------------------------------------------------------------------------------------- > > Key: IGNITE-20410 > URL: https://issues.apache.org/jira/browse/IGNITE-20410 > Project: Ignite > Issue Type: Bug > Reporter: Alexander Lapin > Assignee: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication > fails with ReplicationTimeoutException because of Metadata unavailability on > replication node. > With some extra debug output I got following result: > {code:java} > applyUpdateCommand > applyCmdWithExceptionHandling messageType = 43, groupType = 9 > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3; rejecting ActionRequest with > EBUSY., leaderId=null], err = null > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3; rejecting ActionRequest with > EBUSY., leaderId=null], err = null > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3; rejecting ActionRequest with > EBUSY., leaderId=null], err = null > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3; rejecting ActionRequest with > EBUSY., leaderId=null], err = null > ... > ... > ... > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3; rejecting ActionRequest with > EBUSY., leaderId=null], err = null > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3; rejecting ActionRequest with > EBUSY., leaderId=null], err = null > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3; rejecting ActionRequest with > EBUSY., leaderId=null], err = null > applyCmdWithExceptionHandling RuntimeException > Replication is timed out [replicaGrpId=1_part_0] > org.apache.ignite.tx.TransactionException: IGN-REP-3 > TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out > [replicaGrpId=1_part_0] > at > app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378) > at > app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461) > at > app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378) > at > app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630) > at > app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) > at > app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) > at > app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) > at > java.base@18.0.2/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483) > at > java.base@18.0.2/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) > at > java.base@18.0.2/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) > at > java.base@18.0.2/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) > at > java.base@18.0.2/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) > at > java.base@18.0.2/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) > Caused by: > org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: > IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed > out [replicaGrpId=1_part_0] > ... 9 more {code} > The problem occurs when test invariant is broken, meaning that despite the > {code:java} > transferLeadershipsTo(notInhibitedNodeIndex);{code} > the leader re-elected to {{{}nodeToInhibitMetaStorage{}}}, meaning that we do > expect node_0 (not inhibited) as leader, however the leader is node_1 > (inhibited). > {code:java} > Node to inhibit issart_lsppdr_1 > Inhibit start 1694674876362 > Before put > !!! Before wait > !!! After wait > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3, available level 2; rejecting > ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null > complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet > available, group '1_part_0', required level 3, available level 2; rejecting > ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = > null{code} > Please pay attention to the *node = issart_lsppdr_1* in complete resp = > ErrorResponseImpl [errorCode=1009 it's the same as > Node to inhibit issart_lsppdr_1 -- This message was sent by Atlassian Jira (v8.20.10#820010)