[jira] [Resolved] (IGNITE-20492) NPE in PartitionReplicaListener's primary replica retrieval
[ https://issues.apache.org/jira/browse/IGNITE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Pyatkov resolved IGNITE-20492. Resolution: Duplicate The issue is a duplicate of IGNITE-20484. > NPE in PartitionReplicaListener's primary replica retrieval > --- > > Key: IGNITE-20492 > URL: https://issues.apache.org/jira/browse/IGNITE-20492 > Project: Ignite > Issue Type: Bug >Reporter: Kirill Sizov >Priority: Blocker > Labels: ignite-3 > > PartitionReplicaListener.ensureReplicaIsPrimary has the following block of > code > {code:java} > if (expectedTerm != null) { > return placementDriver.getPrimaryReplica(replicationGroupId, now) > .thenCompose(primaryReplica -> { > long currentEnlistmentConsistencyToken = > primaryReplica.getStartTime().longValue(); > {code} > However, according to the placementDriver's contract, {{getPrimaryReplica}} > can complete with null: > {quote} > Same as awaitPrimaryReplica(ReplicationGroupId, HybridTimestamp) despite the > fact that given method await logic is bounded. It will wait for a primary > replica for a reasonable period of time, and complete a future with null if a > matching lease isn't found. Generally speaking reasonable here means enough > for distribution across cluster nodes. > {quote} > In that case ensureReplicaIsPrimary will crash with NPE: > {noformat} > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$155(PartitionReplicaListener.java:2397) > ~[ignite-table-3.0.0-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.WatchProcessor.invokeOnRevisionCallback(WatchProcessor.java:247) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$2(WatchProcessor.java:148) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) > ~[?:?] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20515) MappedFileMemoryProvider doesn't work while running on JDK 17
[ https://issues.apache.org/jira/browse/IGNITE-20515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770570#comment-17770570 ] Ignite TC Bot commented on IGNITE-20515: {panel:title=Branch: [pull/10961/head] Base: [master] : Possible Blockers (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Platform .NET (Core Linux){color} [[tests 0 TIMEOUT , Exit Code , TC_SERVICE_MESSAGE |https://ci2.ignite.apache.org/viewLog.html?buildId=7356042]] {panel} {panel:title=Branch: [pull/10961/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *-- Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7354789buildTypeId=IgniteTests24Java8_RunAll] > MappedFileMemoryProvider doesn't work while running on JDK 17 > - > > Key: IGNITE-20515 > URL: https://issues.apache.org/jira/browse/IGNITE-20515 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.15 >Reporter: Ivan Daschinsky >Assignee: Ivan Daschinsky >Priority: Major > Labels: ise > Fix For: 2.16 > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > Cannot invoke "java.lang.reflect.Method.invoke(Object, Object[])" because > "o.a.i.i.mem.file.MappedFile.map0" is null > class org.apache.ignite.IgniteCheckedException: Cannot invoke > "java.lang.reflect.Method.invoke(Object, Object[])" because > "org.apache.ignite.internal.mem.file.MappedFile.map0" is null >at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7929) >at > org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:261) >at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:210) >at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:161) >at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3376) >at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3182) >at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) >at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: java.lang.NullPointerException: Cannot invoke > "java.lang.reflect.Method.invoke(Object, Object[])" because > "org.apache.ignite.internal.mem.file.MappedFile.map0" is null >at org.apache.ignite.internal.mem.file.MappedFile.map(MappedFile.java:126) >at > org.apache.ignite.internal.mem.file.MappedFile.(MappedFile.java:65) >at > org.apache.ignite.internal.mem.file.MappedFileMemoryProvider.nextRegion(MappedFileMemoryProvider.java:134) >at > org.apache.ignite.internal.processors.cache.persistence.IgniteCacheDatabaseSharedManager$3.nextRegion(IgniteCacheDatabaseSharedManager.java:1419) >at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.addSegment(PageMemoryNoStoreImpl.java:716) >at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.start(PageMemoryNoStoreImpl.java:279) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20471) Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously
[ https://issues.apache.org/jira/browse/IGNITE-20471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20471: - Description: *Motivation* The following code prints the different logs: {code:java} sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT)"); IgniteSql sql = igniteSql(); Session ses = sql.sessionBuilder().build(); try { ses.execute(null, "INSERT INTO TEST VALUES (?, ?)", 1, 1); } catch (Exception e) { log.error("EXCEPTION", e); throw e; } {code} This log is printed when we call {{log.error("EXCEPTION", e);}} {noformat} [2023-09-29T17:58:48,717][ERROR][main][ItSqlAsynchronousApiTest] EXCEPTION org.apache.ignite.lang.IgniteException: null at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) ~[main/:?] at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) ~[main/:?] at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ~[integrationTest/:?] ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) ~[main/:?] ... Caused by: org.apache.ignite.lang.IgniteException at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:100) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$0(AsyncSqlCursorImpl.java:76) ~[main/:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) ~[?:?] ... Caused by: java.util.concurrent.TimeoutException at org.apache.ignite.internal.sql.engine.exec.ResolvedDependencies.fetchColocationGroup(ResolvedDependencies.java:60) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.fetchColocationGroups(ExecutionServiceImpl.java:982) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.mapFragments(ExecutionServiceImpl.java:850) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$11(ExecutionServiceImpl.java:654) ~[main/:?] ... {noformat} This one is printed after we {{throw e}} {noformat} org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) ... Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) at
[jira] [Updated] (IGNITE-20519) Add causality token of the last update of catalog descriptors to CatalogObjectDescriptor
[ https://issues.apache.org/jira/browse/IGNITE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20519: - Summary: Add causality token of the last update of catalog descriptors to CatalogObjectDescriptor (was: Add causality token of the last update of catalog descriptors ) > Add causality token of the last update of catalog descriptors to > CatalogObjectDescriptor > > > Key: IGNITE-20519 > URL: https://issues.apache.org/jira/browse/IGNITE-20519 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > *Motivation* > It could be useful to add causality token of the last update of > {{CatalogObjectDescriptor}}. For example, this will help us to call > {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for > the specified {{CatalogZoneDescriptor}}, so we could receive data nodes with > accordance of correct version of filter from {{CatalogZoneDescriptor}} > *Implementation notes* > This could be done with the enriching {{UpdateEntry#applyUpdate(Catalog > catalog)}} with {{causalityToken}}, so we could propagate {{causalityToken}} > to all {{UpdateEntry}}, where we recreate {{CatalogObjectDescriptor}} and > create new version of {{Catalog}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20492) NPE in PartitionReplicaListener's primary replica retrieval
[ https://issues.apache.org/jira/browse/IGNITE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-20492: - Priority: Blocker (was: Major) > NPE in PartitionReplicaListener's primary replica retrieval > --- > > Key: IGNITE-20492 > URL: https://issues.apache.org/jira/browse/IGNITE-20492 > Project: Ignite > Issue Type: Bug >Reporter: Kirill Sizov >Priority: Blocker > Labels: ignite-3 > > PartitionReplicaListener.ensureReplicaIsPrimary has the following block of > code > {code:java} > if (expectedTerm != null) { > return placementDriver.getPrimaryReplica(replicationGroupId, now) > .thenCompose(primaryReplica -> { > long currentEnlistmentConsistencyToken = > primaryReplica.getStartTime().longValue(); > {code} > However, according to the placementDriver's contract, {{getPrimaryReplica}} > can complete with null: > {quote} > Same as awaitPrimaryReplica(ReplicationGroupId, HybridTimestamp) despite the > fact that given method await logic is bounded. It will wait for a primary > replica for a reasonable period of time, and complete a future with null if a > matching lease isn't found. Generally speaking reasonable here means enough > for distribution across cluster nodes. > {quote} > In that case ensureReplicaIsPrimary will crash with NPE: > {noformat} > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$155(PartitionReplicaListener.java:2397) > ~[ignite-table-3.0.0-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.WatchProcessor.invokeOnRevisionCallback(WatchProcessor.java:247) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$2(WatchProcessor.java:148) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) > ~[?:?] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20519) Add causality token of the last update of catalog descriptors
[ https://issues.apache.org/jira/browse/IGNITE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20519: - Description: *Motivation* It could be useful to add causality token of the last update of {{CatalogObjectDescriptor}}. For example, this will help us to call {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}}, so we could receive data nodes with accordance of correct version of filter from {{CatalogZoneDescriptor}} *Implementation notes* This could be done with the enriching {{UpdateEntry#applyUpdate(Catalog catalog)}} with {{causalityToken}}, so we could propagate {{causalityToken}} to all {{UpdateEntry}}, where we recreate {{CatalogObjectDescriptor}} and create new version of {{Catalog}} was: *Motivation* It could be useful to add causality token of the last update of {{CatalogObjectDescriptor}}. For example, this will help us to call {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}} *Implementation notes* This could be done with enriching {{UpdateEntry#applyUpdate(Catalog catalog)}} with {{causalityToken}}, so we could propagate {{causalityToken}} to all {{UpdateEntry}}, where we recreate {{CatalogObjectDescriptor}} and create new version of {{Catalog}} > Add causality token of the last update of catalog descriptors > -- > > Key: IGNITE-20519 > URL: https://issues.apache.org/jira/browse/IGNITE-20519 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > *Motivation* > It could be useful to add causality token of the last update of > {{CatalogObjectDescriptor}}. For example, this will help us to call > {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for > the specified {{CatalogZoneDescriptor}}, so we could receive data nodes with > accordance of correct version of filter from {{CatalogZoneDescriptor}} > *Implementation notes* > This could be done with the enriching {{UpdateEntry#applyUpdate(Catalog > catalog)}} with {{causalityToken}}, so we could propagate {{causalityToken}} > to all {{UpdateEntry}}, where we recreate {{CatalogObjectDescriptor}} and > create new version of {{Catalog}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20519) Add causality token of the last update of catalog descriptors
[ https://issues.apache.org/jira/browse/IGNITE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20519: - Description: *Motivation* It could be useful to add causality token of the last update of {{CatalogObjectDescriptor}}. For example, this will help us to call {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}} *Implementation notes* This could be done with enriching {{UpdateEntry#applyUpdate(Catalog catalog)}} with {{causalityToken}}, so we could propagate {{causalityToken}} to all {{UpdateEntry}}, where we recreate {{CatalogObjectDescriptor}} and create new version of {{Catalog}} was: *Motivation* It could be useful to add causality token of the last update of {{CatalogObjectDescriptor}}. For example, this will help us to call {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}} > Add causality token of the last update of catalog descriptors > -- > > Key: IGNITE-20519 > URL: https://issues.apache.org/jira/browse/IGNITE-20519 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > *Motivation* > It could be useful to add causality token of the last update of > {{CatalogObjectDescriptor}}. For example, this will help us to call > {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for > the specified {{CatalogZoneDescriptor}} > *Implementation notes* > This could be done with enriching {{UpdateEntry#applyUpdate(Catalog > catalog)}} with {{causalityToken}}, so we could propagate {{causalityToken}} > to all {{UpdateEntry}}, where we recreate {{CatalogObjectDescriptor}} and > create new version of {{Catalog}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20519) Add causality token of the last update of catalog descriptors
[ https://issues.apache.org/jira/browse/IGNITE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20519: - Description: *Motivation* It could be useful to add causality token of the last update of {{CatalogObjectDescriptor}}. For example, this will help us to call {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}} was: *Motivation* It could be useful to add causality token of the last update of {{CatalogObjectDescriptor}}. For example, this will help us to call dataNodes(int causalityToken) {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}} > Add causality token of the last update of catalog descriptors > -- > > Key: IGNITE-20519 > URL: https://issues.apache.org/jira/browse/IGNITE-20519 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > *Motivation* > It could be useful to add causality token of the last update of > {{CatalogObjectDescriptor}}. For example, this will help us to call > {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for > the specified {{CatalogZoneDescriptor}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20519) Add causality token of the last update of catalog descriptors
[ https://issues.apache.org/jira/browse/IGNITE-20519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20519: - Description: *Motivation* It could be useful to add causality token of the last update of {{CatalogObjectDescriptor}}. For example, this will help us to call dataNodes(int causalityToken) {{DistributionZoneManager#dataNodes(long causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}} > Add causality token of the last update of catalog descriptors > -- > > Key: IGNITE-20519 > URL: https://issues.apache.org/jira/browse/IGNITE-20519 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > *Motivation* > It could be useful to add causality token of the last update of > {{CatalogObjectDescriptor}}. For example, this will help us to call > dataNodes(int causalityToken) {{DistributionZoneManager#dataNodes(long > causalityToken, int zoneId)}} for the specified {{CatalogZoneDescriptor}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20397) java.lang.AssertionError: Group of the event is unsupported
[ https://issues.apache.org/jira/browse/IGNITE-20397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Uttsel updated IGNITE-20397: --- Description: h3. Motivation {code:java} java.lang.AssertionError: Group of the event is unsupported [nodeId=<11_part_18/isaat_n_2>, event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a] at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224) ~[ignite-raft-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191) ~[ignite-raft-3.0.0-SNAPSHOT.jar:?] at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) ~[disruptor-3.3.7.jar:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] {code} [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true] The root cause: # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId(). # In some cases the `subscribers` map is cleared by invocation of StripedDisruptor.StripeEntryHandler#unsubscribe (for example on table dropping), and then StripeEntryHandler receives event with SafeTimeSyncCommandImpl. # It produces an assertion error: `assert handler != null` The issue is not caused by the catalog feature changes. The issue is reproduced when I run the ItSqlAsynchronousApiTest#batchIncomplete with RepeatedTest annotation. In this case the cluster is not restarted after each tests. It possible to reproduced it frequently if add Thread.sleep in StripeEntryHandler#onEvent. h3. Implementation notes We decided that we can use LOG.warn() instead of an assert because it is safely to skip this event if the table was dropped. {code:java} if (handler != null) { handler.onEvent(event, sequence, endOfBatch || subscribers.size() > 1 && !supportsBatches); } else { LOG.warn(format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event)); } {code} *Definition of done* There is no asserts if handler is null. was: {code:java} java.lang.AssertionError: Group of the event is unsupported [nodeId=<11_part_18/isaat_n_2>, event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a] at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224) ~[ignite-raft-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191) ~[ignite-raft-3.0.0-SNAPSHOT.jar:?] at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) ~[disruptor-3.3.7.jar:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] {code} [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true] The root cause: # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId(). # In some cases the `subscribers` map is cleared by invocation of StripedDisruptor.StripeEntryHandler#unsubscribe (for example on table dropping), and then StripeEntryHandler receives event with SafeTimeSyncCommandImpl. # It produces an assertion error: `assert handler != null` The issue is not caused by the catalog feature changes. The issue is reproduced when I run the ItSqlAsynchronousApiTest#batchIncomplete with RepeatedTest annotation. In this case the cluster is not restarted after each tests. It possible to reproduced it frequently if add Thread.sleep in StripeEntryHandler#onEvent. We decided that we can use LOG.warn() instead of an assert: {code:java} if (handler != null) { handler.onEvent(event, sequence, endOfBatch || subscribers.size() > 1 && !supportsBatches); } else { LOG.warn(format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event)); } {code} > java.lang.AssertionError: Group of the event is unsupported > --- > > Key: IGNITE-20397 > URL: https://issues.apache.org/jira/browse/IGNITE-20397 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > {code:java} > java.lang.AssertionError: Group of the event is unsupported > [nodeId=<11_part_18/isaat_n_2>, > event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a] > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224) > ~[ignite-raft-3.0.0-SNAPSHOT.jar:?] > at >
[jira] [Updated] (IGNITE-20471) Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously
[ https://issues.apache.org/jira/browse/IGNITE-20471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20471: - Description: *Motivation* The following code prints the different logs: {code:java} sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT)"); IgniteSql sql = igniteSql(); Session ses = sql.sessionBuilder().build(); try { ses.execute(null, "INSERT INTO TEST VALUES (?, ?)", 1, 1); } catch (Exception e) { log.error("EXCEPTION", e); throw e; } {code} This log is printed when we call {{log.error("EXCEPTION", e);}} {noformat} [2023-09-29T17:58:48,717][ERROR][main][ItSqlAsynchronousApiTest] EXCEPTION org.apache.ignite.lang.IgniteException: null at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) ~[main/:?] at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) ~[main/:?] at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ~[integrationTest/:?] ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) ~[main/:?] ... Caused by: org.apache.ignite.lang.IgniteException at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:100) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$0(AsyncSqlCursorImpl.java:76) ~[main/:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) ~[?:?] ... Caused by: java.util.concurrent.TimeoutException at org.apache.ignite.internal.sql.engine.exec.ResolvedDependencies.fetchColocationGroup(ResolvedDependencies.java:60) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.fetchColocationGroups(ExecutionServiceImpl.java:982) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.mapFragments(ExecutionServiceImpl.java:850) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$11(ExecutionServiceImpl.java:654) ~[main/:?] ... {noformat} {noformat} org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) ... Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) at
[jira] [Updated] (IGNITE-20471) Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously
[ https://issues.apache.org/jira/browse/IGNITE-20471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20471: - Description: *Motivation* The following code prints the different logs: {code:java} sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT)"); IgniteSql sql = igniteSql(); Session ses = sql.sessionBuilder().build(); try { ses.execute(null, "INSERT INTO TEST VALUES (?, ?)", 1, 1); } catch (Exception e) { log.error("EXCEPTION", e); throw e; } {code} This log is printed when we call {{log.error("EXCEPTION", e);}} {noformat} [2023-09-29T17:58:48,717][ERROR][main][ItSqlAsynchronousApiTest] EXCEPTION org.apache.ignite.lang.IgniteException: null at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) ~[main/:?] at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) ~[main/:?] at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ~[integrationTest/:?] ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) ~[main/:?] ... Caused by: org.apache.ignite.lang.IgniteException at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:100) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$0(AsyncSqlCursorImpl.java:76) ~[main/:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) ~[?:?] ... Caused by: java.util.concurrent.TimeoutException at org.apache.ignite.internal.sql.engine.exec.ResolvedDependencies.fetchColocationGroup(ResolvedDependencies.java:60) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.fetchColocationGroups(ExecutionServiceImpl.java:982) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.mapFragments(ExecutionServiceImpl.java:850) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$11(ExecutionServiceImpl.java:654) ~[main/:?] ... {noformat} {noformat} org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) ... Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) at
[jira] [Updated] (IGNITE-20471) Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously
[ https://issues.apache.org/jira/browse/IGNITE-20471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20471: - Description: *Motivation* The following code prints the different logs: {code:java} sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT)"); IgniteSql sql = igniteSql(); Session ses = sql.sessionBuilder().build(); try { ses.execute(null, "INSERT INTO TEST VALUES (?, ?)", 1, 1); } catch (Exception e) { log.error("EXCEPTION", e); throw e; } {code} This log is printed when we call {{log.error("EXCEPTION", e);}} {noformat} [2023-09-29T17:58:48,717][ERROR][main][ItSqlAsynchronousApiTest] EXCEPTION org.apache.ignite.lang.IgniteException: null at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) ~[main/:?] at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) ~[main/:?] at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ~[integrationTest/:?] ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) ~[main/:?] ... Caused by: org.apache.ignite.lang.IgniteException at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:100) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$0(AsyncSqlCursorImpl.java:76) ~[main/:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) ~[?:?] ... Caused by: java.util.concurrent.TimeoutException at org.apache.ignite.internal.sql.engine.exec.ResolvedDependencies.fetchColocationGroup(ResolvedDependencies.java:60) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.fetchColocationGroups(ExecutionServiceImpl.java:982) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.mapFragments(ExecutionServiceImpl.java:850) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$11(ExecutionServiceImpl.java:654) ~[main/:?] ... {noformat} {noformat} org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) ... Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) at
[jira] [Updated] (IGNITE-20471) Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously
[ https://issues.apache.org/jira/browse/IGNITE-20471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20471: - Description: *Motivation* The following code prints the different logs: {code:java} sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT)"); IgniteSql sql = igniteSql(); Session ses = sql.sessionBuilder().build(); try { ses.execute(null, "INSERT INTO TEST VALUES (?, ?)", 1, 1); } catch (Exception e) { log.error("EXCEPTION", e); throw e; } {code} This log is printed when we call {{log.error("EXCEPTION", e);}} {noformat} [2023-09-29T17:58:48,717][ERROR][main][ItSqlAsynchronousApiTest] EXCEPTION org.apache.ignite.lang.IgniteException: null at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) ~[main/:?] at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) ~[main/:?] at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ~[integrationTest/:?] ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) ~[main/:?] ... Caused by: org.apache.ignite.lang.IgniteException at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:100) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$0(AsyncSqlCursorImpl.java:76) ~[main/:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) ~[?:?] ... Caused by: java.util.concurrent.TimeoutException at org.apache.ignite.internal.sql.engine.exec.ResolvedDependencies.fetchColocationGroup(ResolvedDependencies.java:60) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.fetchColocationGroups(ExecutionServiceImpl.java:982) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.mapFragments(ExecutionServiceImpl.java:850) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$11(ExecutionServiceImpl.java:654) ~[main/:?] ... {noformat} {noformat} org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) ... Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) at
[jira] [Updated] (IGNITE-20471) Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously
[ https://issues.apache.org/jira/browse/IGNITE-20471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20471: - Description: *Motivation* {noformat} [2023-09-29T17:58:48,717][ERROR][main][ItSqlAsynchronousApiTest] EXCEPTION org.apache.ignite.lang.IgniteException: null at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) ~[main/:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) ~[main/:?] at org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) ~[main/:?] at org.apache.ignite.internal.sql.api.ItSqlAsynchronousApiTest.select(ItSqlAsynchronousApiTest.java:458) ~[integrationTest/:?] ... Caused by: java.util.concurrent.CompletionException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:54d81fd9-6453-4adb-863d-6e82b9c0cb08 at org.apache.ignite.internal.sql.api.SessionImpl.lambda$executeAsync$3(SessionImpl.java:208) ~[main/:?] ... Caused by: org.apache.ignite.lang.IgniteException at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:110) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:100) ~[main/:?] at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$0(AsyncSqlCursorImpl.java:76) ~[main/:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) ~[?:?] ... Caused by: java.util.concurrent.TimeoutException at org.apache.ignite.internal.sql.engine.exec.ResolvedDependencies.fetchColocationGroup(ResolvedDependencies.java:60) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.fetchColocationGroups(ExecutionServiceImpl.java:982) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.mapFragments(ExecutionServiceImpl.java:850) ~[main/:?] at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$11(ExecutionServiceImpl.java:654) ~[main/:?] ... {noformat} was: *Motivation* According the logic of invocations of {{TxManagerImpl#finish}}, it is possible that {{recipientNode}}, which is passed to {{finish}}, could be {{null}}. Further in the code of {{finish}} method we make {{replicaService.invoke(recipientNode)}} and this could lead to {{NullPointerException}}. UPD1: It is possible that I was wrong and we even don't reach the code where we call invoke {{replicaService.invoke(recipientNode)}}, because before we check {{groups.isEmpty()}} and seems that we go through the other branch. Need to investigate why I've got {{null}} when run {{ItTableRaftSnapshotsTest#entriesKeepAppendedAfterSnapshotInstallation}} > Timeout exception from org.apache.ignite.sql.Session#execute() could be > printed to log ambiguously > -- > > Key: IGNITE-20471 > URL: https://issues.apache.org/jira/browse/IGNITE-20471 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > *Motivation* > {noformat} > [2023-09-29T17:58:48,717][ERROR][main][ItSqlAsynchronousApiTest] EXCEPTION > org.apache.ignite.lang.IgniteException: null > at > java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) > ~[?:?] > at > org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) > ~[main/:?] > at > org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) > ~[main/:?] > at > org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) > ~[main/:?] > at > org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) > ~[main/:?] > at > org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) > ~[main/:?] > at >
[jira] [Updated] (IGNITE-20471) Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously
[ https://issues.apache.org/jira/browse/IGNITE-20471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20471: - Summary: Timeout exception from org.apache.ignite.sql.Session#execute() could be printed to log ambiguously (was: Handle TxManagerImpl#finish correctly when recipientNode is null) > Timeout exception from org.apache.ignite.sql.Session#execute() could be > printed to log ambiguously > -- > > Key: IGNITE-20471 > URL: https://issues.apache.org/jira/browse/IGNITE-20471 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > *Motivation* > According the logic of invocations of {{TxManagerImpl#finish}}, it is > possible that {{recipientNode}}, which is passed to {{finish}}, could be > {{null}}. Further in the code of {{finish}} method we make > {{replicaService.invoke(recipientNode)}} and this could lead to > {{NullPointerException}}. > UPD1: > It is possible that I was wrong and we even don't reach the code where we > call invoke {{replicaService.invoke(recipientNode)}}, because before we > check {{groups.isEmpty()}} and seems that we go through the other branch. > Need to investigate why I've got {{null}} when run > {{ItTableRaftSnapshotsTest#entriesKeepAppendedAfterSnapshotInstallation}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20485) Allow to configure lease interval
[ https://issues.apache.org/jira/browse/IGNITE-20485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Pyatkov updated IGNITE-20485: --- Description: *Motivation* Currently, the lease interval depends on the lease update frequency and is calculated like this: {code:title=LeaseUpdater.java} private static final long LEASE_INTERVAL = 10 * UPDATE_LEASE_MS; {code} The interval is impossible to configure; that way, it makes the test longer than it can be with a short lease interval. *Implementation notes* * Create a new property root to configure. The property has to be specified for the placement driver only (PlacementDriverConfigurationScema). * The property should be configured for the entire cluster (ConfigurationType#DISTRIBUTED). * The property may change on the alive cluster. The placement driver manager has to handle configuration updates. * The property should calculate two parameters: the lease interval and the long lease interval (LeaseUpdater#LEASE_INTERVAL, LeaseUpdater#longLeaseInterval). The frequency of checking leases is being evaluated based on the lease interval (LeaseUpdater#UPDATE_LEASE_MS). * In addition to the tuning ability through the configuration framework, we should provide a configuration through system properties to use in tests. Add the system property for the lease interval only, because for the long lease interval, the property already exists. * Do not forget to check TODOs. *Definition of done* Allow to configure lease intervat at least the system properties to use in the test. Also, the ability to configure should be available through the Ignite property. was: *Motivation* Currently, the lease interval depends on the lease update frequency and is calculated like this: {code:title=LeaseUpdater.java} private static final long LEASE_INTERVAL = 10 * UPDATE_LEASE_MS; {code} The interval is impossible to configure; that way, it makes the test longer than it can be with a short lease interval. *Implementation notes* Do not forget to check TODOs. *Definition of done* Allow to configure lease intervat at least the system properties to use in the test. Also, the ability to configure should be available through the Ignite property. > Allow to configure lease interval > - > > Key: IGNITE-20485 > URL: https://issues.apache.org/jira/browse/IGNITE-20485 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > > *Motivation* > Currently, the lease interval depends on the lease update frequency and is > calculated like this: > {code:title=LeaseUpdater.java} > private static final long LEASE_INTERVAL = 10 * UPDATE_LEASE_MS; > {code} > The interval is impossible to configure; that way, it makes the test longer > than it can be with a short lease interval. > *Implementation notes* > * Create a new property root to configure. The property has to be specified > for the placement driver only (PlacementDriverConfigurationScema). > * The property should be configured for the entire cluster > (ConfigurationType#DISTRIBUTED). > * The property may change on the alive cluster. The placement driver manager > has to handle configuration updates. > * The property should calculate two parameters: the lease interval and the > long lease interval (LeaseUpdater#LEASE_INTERVAL, > LeaseUpdater#longLeaseInterval). The frequency of checking leases is being > evaluated based on the lease interval (LeaseUpdater#UPDATE_LEASE_MS). > * In addition to the tuning ability through the configuration framework, we > should provide a configuration through system properties to use in tests. Add > the system property for the lease interval only, because for the long lease > interval, the property already exists. > * Do not forget to check TODOs. > *Definition of done* > Allow to configure lease intervat at least the system properties to use in > the test. > Also, the ability to configure should be available through the Ignite > property. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20484) NPE when some operation occurs when the primary replica is changing
[ https://issues.apache.org/jira/browse/IGNITE-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20484: --- Description: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) ~[main/:?] at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) ~[main/:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) ~[main/:?] at org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) ~[main/:?] at org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:269) ~[main/:?] at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783) [?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.lang.NullPointerException at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$161(PartitionReplicaListener.java:2415) ~[main/:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?] ... 15 more {noformat} *Definition of done* In this case, we should throw the correct exception because the request cannot be handled in this replica anymore, and the matched transaction will be rolled back. *Implementation notes* Do not forget to check all places where the issue is mentioned (especially in TODO section). As discussed with [~sanpwc]: This exception is likely to be thrown when - we successfully get a primary replica on one node - send a message and the message is slightly slow to be delivered - we handle the received message on the recepient node and run {{placementDriver.getPrimaryReplica}}. If the previous lease has expired by the time we handle the message, the call to {{placementDriver}} will result in a {{null}} value instead of a {{ReplicaMeta}} instance. Hence the NPE. was: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at
[jira] [Updated] (IGNITE-20484) NPE when some operation occurs when the primary replica is changing
[ https://issues.apache.org/jira/browse/IGNITE-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20484: --- Description: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) ~[main/:?] at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) ~[main/:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) ~[main/:?] at org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) ~[main/:?] at org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:269) ~[main/:?] at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783) [?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.lang.NullPointerException at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$161(PartitionReplicaListener.java:2415) ~[main/:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?] ... 15 more {noformat} *Definition of done* In this case, we should throw the correct exception because the request cannot be handled in this replica anymore, and the matched transaction will be rolled back. *Implementation notes* Do not forget to check all places where the issue is mentioned (especially in TODO section). As discussed with [~sanpwc]: This exception is likely to be thrown when - we successfully get a primary replica on one node - send a message and the message is slightly slow to be delivered - we handle the received message on the recepient node and run {{placementDriver.getPrimaryReplica}}. If the previous lease has expired by the time we handle the message, the call to {{placementDriver}} will result in a {{null}} value instead of a {{ReplicaMeta}} instance. Any call with no null check on it may end up with NPE. Calling was: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at
[jira] [Updated] (IGNITE-20484) NPE when some operation occurs when the primary replica is changing
[ https://issues.apache.org/jira/browse/IGNITE-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20484: --- Description: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) ~[main/:?] at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) ~[main/:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) ~[main/:?] at org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) ~[main/:?] at org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:269) ~[main/:?] at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783) [?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.lang.NullPointerException at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$161(PartitionReplicaListener.java:2415) ~[main/:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?] ... 15 more {noformat} *Definition of done* In this case, we should throw the correct exception because the request cannot be handled in this replica anymore, and the matched transaction will be rolled back. *Implementation notes* Do not forget to check all places where the issue is mentioned (especially in TODO section). As discussed with [~sanpwc]: This exception is likely to be thrown when - we successfully get a primary replica on one node - send a message and the message is slightly slow to be delivered - we handle the received message on the recepient node and run {{placementDriver.getPrimaryReplica}}. If the previous lease has expired by the time we handle the message, the call to {{placementDriver}} will result in a {{null}} value instead of {{ReplicaMeta}}. Any call with no null check on it may end up with NPE. Calling was: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at
[jira] [Updated] (IGNITE-20484) NPE when some operation occurs when the primary replica is changing
[ https://issues.apache.org/jira/browse/IGNITE-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20484: --- Description: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) ~[main/:?] at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) ~[main/:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) ~[main/:?] at org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) ~[main/:?] at org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:269) ~[main/:?] at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783) [?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.lang.NullPointerException at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$161(PartitionReplicaListener.java:2415) ~[main/:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?] ... 15 more {noformat} *Definition of done* In this case, we should throw the correct exception because the request cannot be handled in this replica anymore, and the matched transaction will be rolled back. *Implementation notes* Do not forget to check all places where the issue is mentioned (especially in TODO section). As discussed with [~sanpwc]: This exception is likely to be thrown when - we successfully get a primary replica on one node - send a message and the message is slightly slow to be delivered - we handle the received message on the recepient node and run {{placementDriver.getPrimaryReplica}}. If the previous lease has expired by the time we handle the message, the call to {{placementDriver}} will result in a {{null}} value instead of {{ReplicaMeta}}. Any call with no null check on it may end up with NPE. Calling was: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at
[jira] [Updated] (IGNITE-20484) NPE when some operation occurs when the primary replica is changing
[ https://issues.apache.org/jira/browse/IGNITE-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20484: --- Description: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) ~[main/:?] at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) ~[main/:?] at org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) ~[main/:?] at org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) ~[main/:?] at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) ~[main/:?] at org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$advanceSafeTime$7(WatchProcessor.java:269) ~[main/:?] at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783) [?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Caused by: java.lang.NullPointerException at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$161(PartitionReplicaListener.java:2415) ~[main/:?] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?] ... 15 more {noformat} *Definition of done* In this case, we should throw the correct exception because the request cannot be handled in this replica anymore, and the matched transaction will be rolled back. *Implementation notes* Do not forget to check all places where the issue is mentioned (especially in TODO section). As discussed with [~sanpwc]: This exception is likely to be thrown when - we get primary replica on one node - send a message and the message is slightly slow to be delivered - we handle the received message on a node and run {{placementDriver.getPrimaryReplica}}. If the previous lease has expired by the time we handle the message, the call to {{placementDriver}} will result in a {{null}} value instead of {{ReplicaMeta}}. Any call with no null check on it may end up with NPE. Calling was: *Motivation* It happens that when the request is created, the primary replica is in this node, but when the request is executed in the replica, it has already lost its role. {noformat} [2023-09-25T11:03:24,408][WARN ][%iprct_tpclh_2%metastorage-watch-executor-2][ReplicaManager] Failed to process replica request [request=ReadWriteSingleRowReplicaRequestImpl [binaryRowMessage=BinaryRowMessageImpl [binaryTuple=java.nio.HeapByteBuffer[pos=0 lim=9 cap=9], schemaVersion=1], commitPartitionId=TablePartitionIdMessageImpl [partitionId=0, tableId=4], full=true, groupId=4_part_0, requestType=RW_UPSERT, term=24742070009862, timestampLong=24742430588928, transactionId=018acb5d-4e54-0006--705db0b1]] java.util.concurrent.CompletionException: java.lang.NullPointerException at
[jira] [Updated] (IGNITE-20408) Replace tx coordinator non-consistent ID with coordinator ClusterNode in local tx state map
[ https://issues.apache.org/jira/browse/IGNITE-20408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20408: --- Description: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. *Implementation details* h6. Details of the issue: # a {{NetworkMessage}} is processed in {{ReplicaManager.onReplicaMessageReceived}}, we get sender id (which is a non-consistent id) from the parameter {{senderConsistentId}}: {code} String senderId = clusterNetSvc.topologyService().getByConsistentId(senderConsistentId).id(); {code} # {{senderId}} is then stored in {{TxStateMeta}} when {{PartitionReplicaListener}} calls {{txManager.updateTxMeta}} with it. # Later when we perform write intent resolution in {{TransactionStateResolver.resolveDistributiveTxState}} we take the previously stored sender id as then id of a coordinator node and run {code} resolveTxStateFromTxCoordinator(txId, localMeta.txCoordinatorId(), commitGrpId, timestamp0, txMetaFuture); {code} If the node was restarted after it had successfully delivered a {{NetworkMessage}} but before #1, the code from #1 may return a different sender id: {noformat} coordinator (localId = A, consistentId = 1): send message M0 (id = 1) --> primary: receive message M0 (id = 1) coordinator (localId = A, consistentId = 1): restart coordinator (localId = B, consistentId = 1): the node with the same consistent id has now a different local id, previous volatile state is lost primary: Find coordinator for write intent resolution for consistent id = 1. We get node B with no state. {noformat} was: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. *Implementation details* First, the issue. # a {{NetworkMessage}} is processed in {{ReplicaManager.onReplicaMessageReceived}}, we get sender id (which is a non-consistent id) from the parameter {{senderConsistentId}}: {code} String senderId = clusterNetSvc.topologyService().getByConsistentId(senderConsistentId).id(); {code} # {{senderId}} is then stored in {{TxStateMeta}} when {{PartitionReplicaListener}} calls {{txManager.updateTxMeta}} with it. # Later when we perform write intent resolution in {{TransactionStateResolver.resolveDistributiveTxState}} we take the previously stored sender id as then id of a coordinator node and run {code}
[jira] [Updated] (IGNITE-20408) Replace tx coordinator non-consistent ID with coordinator ClusterNode in local tx state map
[ https://issues.apache.org/jira/browse/IGNITE-20408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20408: --- Description: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. *Implementation details* First, the issue. # a {{NetworkMessage}} is processed in {{ReplicaManager.onReplicaMessageReceived}}, we get sender id (which is a non-consistent id) from the parameter {{senderConsistentId}}: {code} String senderId = clusterNetSvc.topologyService().getByConsistentId(senderConsistentId).id(); {code} # {{senderId}} is then stored in {{TxStateMeta}} when {{PartitionReplicaListener}} calls {{txManager.updateTxMeta}} with it. # Later when we perform write intent resolution in {{TransactionStateResolver.resolveDistributiveTxState}} we take the previously stored sender id as then id of a coordinator node and run {code} resolveTxStateFromTxCoordinator(txId, localMeta.txCoordinatorId(), commitGrpId, timestamp0, txMetaFuture); {code} If the node was restarted after it had successfully delivered a {{NetworkMessage}} but before #1, the code from #1 may return a different sender id: {noformat} coordinator (localId = A, consistentId = 1): send message M0 (id = 1) --> primary: receive message M0 (id = 1) coordinator (localId = A, consistentId = 1): restart coordinator (localId = B, consistentId = 1): the node with the same consistent id has now a different local id, previous volatile state is lost primary: Find coordinator for write intent resolution for consistent id = 1. We get node B with no state. {noformat} was: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. *Implementation details* First, the issue. # a {{NetworkMessage}} is processed in {{ReplicaManager.onReplicaMessageReceived}}, we get sender id (which is a non-consistent id) from the parameter {{senderConsistentId}}: {code} String senderId = clusterNetSvc.topologyService().getByConsistentId(senderConsistentId).id(); {code} # {{senderId}} is then stored in {{TxStateMeta}} when {{PartitionReplicaListener}} calls {{txManager.updateTxMeta}} with it. # Later when we perform write intent resolution in {{TransactionStateResolver.resolveDistributiveTxState}} we take the previously stored sender id as then id of a coordinator node and run {code}
[jira] [Updated] (IGNITE-20408) Replace tx coordinator non-consistent ID with coordinator ClusterNode in local tx state map
[ https://issues.apache.org/jira/browse/IGNITE-20408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20408: --- Description: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. *Implementation details* First, the issue. # a {{NetworkMessage}} is processed in {{ReplicaManager.onReplicaMessageReceived}}, we get sender id (which is a non-consistent id) from the parameter {{senderConsistentId}}: {code} String senderId = clusterNetSvc.topologyService().getByConsistentId(senderConsistentId).id(); {code} # {{senderId}} is then stored in {{TxStateMeta}} when {{PartitionReplicaListener}} calls {{txManager.updateTxMeta}} with it. # Later when we perform write intent resolution in {{TransactionStateResolver.resolveDistributiveTxState}} we take the previously stored sender id as then id of a coordinator node and run {code} resolveTxStateFromTxCoordinator(txId, localMeta.txCoordinatorId(), commitGrpId, timestamp0, txMetaFuture); {code} If the node was restarted after it had successfully delivered a {{NetworkMessage}} but before #1, the code from #1 may return a different sender id: {noformat} coordinator (localId = A, consistentId = 1): send message M0 (id = 1) --> primary: receive message M0 (id = 1) coordinator (localId = A, consistentId = 1): restart coordinator (localId = B, consistentId = 1): the same node has now different local id, previous volatile state is lost primary: Find coordinator for write intent resolution for consistent id = 1. We get node B with no state. {noformat} was: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. *Implementation details* First, the issue. # a {{NetworkMessage}} is processed in {{ReplicaManager.onReplicaMessageReceived}}, we get sender id (which is a non-consistent id) from the parameter {{senderConsistentId}}: {code} String senderId = clusterNetSvc.topologyService().getByConsistentId(senderConsistentId).id(); {code} # {{senderId}} is then stored in {{TxStateMeta}} when {{PartitionReplicaListener}} calls {{txManager.updateTxMeta}} with it. # Later when we perform write intent resolution in {{TransactionStateResolver.resolveDistributiveTxState}} we take the previously stored sender id as then id of a coordinator node and run {code} resolveTxStateFromTxCoordinator(txId,
[jira] [Updated] (IGNITE-20408) Replace tx coordinator non-consistent ID with coordinator ClusterNode in local tx state map
[ https://issues.apache.org/jira/browse/IGNITE-20408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov updated IGNITE-20408: --- Description: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. *Implementation details* First, the issue. # a {{NetworkMessage}} is processed in {{ReplicaManager.onReplicaMessageReceived}}, we get sender id (which is a non-consistent id) from the parameter {{senderConsistentId}}: {code} String senderId = clusterNetSvc.topologyService().getByConsistentId(senderConsistentId).id(); {code} # {{senderId}} is then stored in {{TxStateMeta}} when {{PartitionReplicaListener}} calls {{txManager.updateTxMeta}} with it. # Later when we perform write intent resolution in {{TransactionStateResolver.resolveDistributiveTxState}} we take the previously stored sender id as then id of a coordinator node and run {code} resolveTxStateFromTxCoordinator(txId, localMeta.txCoordinatorId(), commitGrpId, timestamp0, txMetaFuture); {code} If the node was restarted after it has successfully delivered a {{NetworkMessage}} but before #1, the code from #1 may return a different sender id: {noformat} coordinator (localId = A, consistentId = 1): send message M0 (id = 1) --> primary: receive message M0 (id = 1) coordinator (localId = A, consistentId = 1): restart coordinator (localId = B, consistentId = 1): the same node has now different local id, previous volatile state is lost primary: Find coordinator for write intent resolution for consistent id = 1. We get node B with no state. {noformat} was: *Motivation* Local map of transaction states (local tx state map) contains non-consistent id of a transaction coordinator node. When trying to resolve write intents using coordinator path, we need to check whether the coordinator is still present in cluster and has not restarted (because if it has restarted it means it lost its volatile state, including local tx state map). But we can't get the coordinator's non-consistent id in the message handler, and can't send the message to the node using its non-consistent id, so the following race is possible: * we receive message from coordinator with its consistent id, * try to resolve its non-consistent id to save it in the local tx state map, but we get the id of restarted node from topology service, so this non-consistent id is no longer valid. There is a ticket for the improvement that will allow us to get ClusterNode containing non-consistent id in the message handler: IGNITE-20296 . After that improvement we will be able to get ClusterNode as a sender and will have to replace coordinator id with ClusterNode representing coordinator in tx local state map. *Definition of done* Local map of transaction states contains ClusterNode representing the coordinator instead of its non-consistent id, and the message to the coordinator is sent using this ClusterNode as a recepient node. > Replace tx coordinator non-consistent ID with coordinator ClusterNode in > local tx state map > --- > > Key: IGNITE-20408 > URL: https://issues.apache.org/jira/browse/IGNITE-20408 > Project: Ignite > Issue Type: Improvement >Reporter: Denis Chudov >Priority: Major > Labels: ignite-3 > > *Motivation* > Local map of transaction states (local tx state map) contains non-consistent > id of a transaction coordinator node. When trying to resolve write intents > using coordinator path, we need to check whether the
[jira] [Commented] (IGNITE-20055) Durable txCleanupReplicaRequest send from the commit partition
[ https://issues.apache.org/jira/browse/IGNITE-20055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770430#comment-17770430 ] Alexander Lapin commented on IGNITE-20055: -- [~ksizov] LGTM! > Durable txCleanupReplicaRequest send from the commit partition > -- > > Key: IGNITE-20055 > URL: https://issues.apache.org/jira/browse/IGNITE-20055 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Assignee: Kirill Sizov >Priority: Major > Labels: ignite-3, transaction3_recovery, transactions > Time Spent: 5.5h > Remaining Estimate: 0h > > h3. Motivation > It's required to continuously send txCleanupReplicaRequest to the primary > replica. Suggested flow is following. > h3. Definition of Done > # Resend exact the same type of finish output that was initially evaluated, > meaning that commit will be resent infinitely even if previous > txCleanupReplicaRequest returns an exception. > # Await commit partition primary replica appearance in case of initially > enlisted recipient failure. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20367) ItTableRaftSnapshotsTest times out with high flaky rate
[ https://issues.apache.org/jira/browse/IGNITE-20367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin reassigned IGNITE-20367: Assignee: Alexander Lapin > ItTableRaftSnapshotsTest times out with high flaky rate > --- > > Key: IGNITE-20367 > URL: https://issues.apache.org/jira/browse/IGNITE-20367 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Lapin >Assignee: Alexander Lapin >Priority: Blocker > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > org.apache.ignite.lang.IgniteException: IGN-CMN-65535 > TraceId:f1535407-3cf9-48cd-9091-825ecf308526 at > java.base@11.0.17/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) > at > app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) > at > app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) > at > app//org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) > at > app//org.apache.ignite.internal.SessionUtils.executeUpdate(SessionUtils.java:38) > at > app//org.apache.ignite.internal.SessionUtils.executeUpdate(SessionUtils.java:50) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.lambda$executeDmlWithRetry$1(ItTableRaftSnapshotsTest.java:231) > at app//org.apache.ignite.internal.Cluster.doInSession(Cluster.java:448) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.lambda$executeDmlWithRetry$2(ItTableRaftSnapshotsTest.java:230) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.withRetry(ItTableRaftSnapshotsTest.java:184) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.executeDmlWithRetry(ItTableRaftSnapshotsTest.java:229) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.prepareClusterForInstallingSnapshotToNode2(ItTableRaftSnapshotsTest.java:351) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.snapshotInstallTimeoutDoesNotBreakSubsequentInstallsWhenSecondAttemptIsIdenticalToFirst(ItTableRaftSnapshotsTest.java:685) > at > java.base@11.0.17/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base@11.0.17/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base@11.0.17/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@11.0.17/java.lang.reflect.Method.invoke(Method.java:566) at > app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) > at > app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at >
[jira] [Updated] (IGNITE-20367) ItTableRaftSnapshotsTest times out with high flaky rate
[ https://issues.apache.org/jira/browse/IGNITE-20367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-20367: - Priority: Blocker (was: Major) > ItTableRaftSnapshotsTest times out with high flaky rate > --- > > Key: IGNITE-20367 > URL: https://issues.apache.org/jira/browse/IGNITE-20367 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Lapin >Priority: Blocker > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > org.apache.ignite.lang.IgniteException: IGN-CMN-65535 > TraceId:f1535407-3cf9-48cd-9091-825ecf308526 at > java.base@11.0.17/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) > at > app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) > at > app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:706) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:543) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:641) > at > app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:494) > at > app//org.apache.ignite.internal.sql.AbstractSession.execute(AbstractSession.java:63) > at > app//org.apache.ignite.internal.SessionUtils.executeUpdate(SessionUtils.java:38) > at > app//org.apache.ignite.internal.SessionUtils.executeUpdate(SessionUtils.java:50) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.lambda$executeDmlWithRetry$1(ItTableRaftSnapshotsTest.java:231) > at app//org.apache.ignite.internal.Cluster.doInSession(Cluster.java:448) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.lambda$executeDmlWithRetry$2(ItTableRaftSnapshotsTest.java:230) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.withRetry(ItTableRaftSnapshotsTest.java:184) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.executeDmlWithRetry(ItTableRaftSnapshotsTest.java:229) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.prepareClusterForInstallingSnapshotToNode2(ItTableRaftSnapshotsTest.java:351) > at > app//org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.snapshotInstallTimeoutDoesNotBreakSubsequentInstallsWhenSecondAttemptIsIdenticalToFirst(ItTableRaftSnapshotsTest.java:685) > at > java.base@11.0.17/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base@11.0.17/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base@11.0.17/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@11.0.17/java.lang.reflect.Method.invoke(Method.java:566) at > app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) > at > app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
[jira] [Commented] (IGNITE-20502) Sql. Rework fragment mapping
[ https://issues.apache.org/jira/browse/IGNITE-20502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770408#comment-17770408 ] Yury Gerzhedovich commented on IGNITE-20502: [~korlov] LGTM > Sql. Rework fragment mapping > > > Key: IGNITE-20502 > URL: https://issues.apache.org/jira/browse/IGNITE-20502 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Konstantin Orlov >Assignee: Konstantin Orlov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, fragment mapping supports two strategies: some nodes from list and > exact mapping for partitioned sources. To integrate System Views, we need to > support two more strategies: all nodes from a given list (for node views) and > exactly one node from given list (for cluster views). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20418) Command 'indexes_force_rebuild' should work with several certain nodes.
[ https://issues.apache.org/jira/browse/IGNITE-20418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev updated IGNITE-20418: - Fix Version/s: 2.16 > Command 'indexes_force_rebuild' should work with several certain nodes. > --- > > Key: IGNITE-20418 > URL: https://issues.apache.org/jira/browse/IGNITE-20418 > Project: Ignite > Issue Type: Task >Reporter: Vladimir Steshin >Assignee: Vladimir Steshin >Priority: Minor > Labels: ise > Fix For: 2.16 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, control.sh's command 'indexes_force_rebuild' has no ablity to > lauch index rebuild on several certain nodes. Only one node is accepted as > command parameter (--node-id). It would be handy to pass several nodes to > execute on like '--nodes'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20466) Investigate running sonar checks from fork repositories
[ https://issues.apache.org/jira/browse/IGNITE-20466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770397#comment-17770397 ] Maxim Muzafarov commented on IGNITE-20466: -- References: pull_request_target https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request_target Checking out a merge commit in pull_request_target workflows #518 https://github.com/actions/checkout/issues/518 Feature Request |trigger action on "Pull Request Approved" #25372 https://github.com/orgs/community/discussions/25372 Run Sonar scan for PRs from forks https://stackoverflow.com/questions/76528833/run-sonar-scan-for-prs-from-forks How to use SonarCloud with a forked repository on GitHub? https://community.sonarsource.com/t/how-to-use-sonarcloud-with-a-forked-repository-on-github/7363/30 > Investigate running sonar checks from fork repositories > --- > > Key: IGNITE-20466 > URL: https://issues.apache.org/jira/browse/IGNITE-20466 > Project: Ignite > Issue Type: Task >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > > Investigate running sonar checks from fork repositories. > See the discussion here: > https://github.com/actions/checkout/issues/518 > Additionally, we can run checks after a pull-request has been approved by a > maintainer: > https://github.com/orgs/community/discussions/25372 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20508) DeadlockDetectionManager removal
[ https://issues.apache.org/jira/browse/IGNITE-20508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770396#comment-17770396 ] Ignite TC Bot commented on IGNITE-20508: {panel:title=Branch: [pull/10959/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10959/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *-- Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7355049buildTypeId=IgniteTests24Java8_RunAll] > DeadlockDetectionManager removal > > > Key: IGNITE-20508 > URL: https://issues.apache.org/jira/browse/IGNITE-20508 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20358) Make distributed node storage config local
[ https://issues.apache.org/jira/browse/IGNITE-20358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Gusakov updated IGNITE-20358: Issue Type: Improvement (was: Task) > Make distributed node storage config local > -- > > Key: IGNITE-20358 > URL: https://issues.apache.org/jira/browse/IGNITE-20358 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Gusakov >Assignee: Kirill Gusakov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 50m > Remaining Estimate: 0h > > *Motivation* > At the moment, all {{*StorageEngineConfigurationSchema}} has the > {{ConfigurationType.DISTRIBUTED}} type. But it is not the case anymore, each > node can have the different storage configurations by new design. > *Definition of done* > - All {{*StorageEngineConfigurationSchema}} configurations moved to the > {{ConfigurationType.LOCAL}} scope. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20478) Sql. Rework use of UNSPECIFIED_VALUE_PLACEHOLDER in row.
[ https://issues.apache.org/jira/browse/IGNITE-20478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-20478: -- Description: Currently, when scanning an index, we set a special value called "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches the bound (more details in IGNITE-16443). To be able to complete the transition to using a binary tuple, we need to rework this approach and try to avoid storing non-conforming schema values in row. Currently, this placeholder sets to row when the search bound is open (that is, when the RexNode is null in the list when creating a scalar). {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements must not be null). After reworking {{expandBounds}} the {{searchRow}} that comes to {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. The code {{ExpressionFactoryImpl#comparator}} that uses this placeholder does not appear to be executing and can be removed. was: Currently, when scanning an index, we set a special value called "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches the bound (more details in IGNITE-16443). To be able to complete the transition to using a binary tuple, we need to rework this approach and try to avoid storing non-conforming schema values in row. Currently, this placeholder sets to row when the search bound is open (that is, when the RexNode is null in the list when creating a scalar). {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements must not be null). After reworking {{expandBounds}} the {{searchRow}} that comes to {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. In {{ExpressionFactoryImpl#comparator}} this placeholder does not seem to be used and this code can be removed. > Sql. Rework use of UNSPECIFIED_VALUE_PLACEHOLDER in row. > > > Key: IGNITE-20478 > URL: https://issues.apache.org/jira/browse/IGNITE-20478 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > > Currently, when scanning an index, we set a special value called > "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches > the bound (more details in IGNITE-16443). > To be able to complete the transition to using a binary tuple, we need to > rework this approach and try to avoid storing non-conforming schema values in > row. > Currently, this placeholder sets to row when the search bound is open (that > is, when the RexNode is null in the list when creating a scalar). > {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should > be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements > must not be null). > After reworking {{expandBounds}} the {{searchRow}} that comes to > {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. > The code {{ExpressionFactoryImpl#comparator}} that uses this placeholder does > not appear to be executing and can be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20478) Sql. Rework use of UNSPECIFIED_VALUE_PLACEHOLDER in row.
[ https://issues.apache.org/jira/browse/IGNITE-20478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-20478: -- Description: Currently, when scanning an index, we set a special value called "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches the bound (more details in IGNITE-16443). To be able to complete the transition to using a binary tuple, we need to rework this approach and try to avoid storing non-conforming schema values in row. Currently, this placeholder is set to row when the search bound is open (that is, when the RexNode is null in the list when creating a scalar). {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements must not be null). After reworking {{expandBounds}} the {{searchRow}} that comes to {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. In {{ExpressionFactoryImpl#comparator}} this placeholder does not seem to be used and this code can be removed. was: Currently, when scanning an index, we set a special value called "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches the bound (more details in IGNITE-16443). To be able to complete the transition to using a binary tuple, we need to rework this approach and try to avoid storing non-conforming schema values in row. > Sql. Rework use of UNSPECIFIED_VALUE_PLACEHOLDER in row. > > > Key: IGNITE-20478 > URL: https://issues.apache.org/jira/browse/IGNITE-20478 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > > Currently, when scanning an index, we set a special value called > "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches > the bound (more details in IGNITE-16443). > To be able to complete the transition to using a binary tuple, we need to > rework this approach and try to avoid storing non-conforming schema values in > row. > Currently, this placeholder is set to row when the search bound is open (that > is, when the RexNode is null in the list when creating a scalar). > {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should > be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements > must not be null). > After reworking {{expandBounds}} the {{searchRow}} that comes to > {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. > In {{ExpressionFactoryImpl#comparator}} this placeholder does not seem to be > used and this code can be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20478) Sql. Rework use of UNSPECIFIED_VALUE_PLACEHOLDER in row.
[ https://issues.apache.org/jira/browse/IGNITE-20478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-20478: -- Description: Currently, when scanning an index, we set a special value called "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches the bound (more details in IGNITE-16443). To be able to complete the transition to using a binary tuple, we need to rework this approach and try to avoid storing non-conforming schema values in row. Currently, this placeholder sets to row when the search bound is open (that is, when the RexNode is null in the list when creating a scalar). {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements must not be null). After reworking {{expandBounds}} the {{searchRow}} that comes to {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. In {{ExpressionFactoryImpl#comparator}} this placeholder does not seem to be used and this code can be removed. was: Currently, when scanning an index, we set a special value called "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches the bound (more details in IGNITE-16443). To be able to complete the transition to using a binary tuple, we need to rework this approach and try to avoid storing non-conforming schema values in row. Currently, this placeholder is set to row when the search bound is open (that is, when the RexNode is null in the list when creating a scalar). {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements must not be null). After reworking {{expandBounds}} the {{searchRow}} that comes to {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. In {{ExpressionFactoryImpl#comparator}} this placeholder does not seem to be used and this code can be removed. > Sql. Rework use of UNSPECIFIED_VALUE_PLACEHOLDER in row. > > > Key: IGNITE-20478 > URL: https://issues.apache.org/jira/browse/IGNITE-20478 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > > Currently, when scanning an index, we set a special value called > "UNSPECIFIED_VALUE_PLACEHOLDER" to row. Which means that any value matches > the bound (more details in IGNITE-16443). > To be able to complete the transition to using a binary tuple, we need to > rework this approach and try to avoid storing non-conforming schema values in > row. > Currently, this placeholder sets to row when the search bound is open (that > is, when the RexNode is null in the list when creating a scalar). > {{ExpressionFactoryImpl#expandBounds}} needs to be reworked and there should > be no open bounds (see {{ExpressionFactoryImpl#compile}} all nodes elements > must not be null). > After reworking {{expandBounds}} the {{searchRow}} that comes to > {{RowConverter#toBinaryTuplePrefix}} should already contain a prefix only. > In {{ExpressionFactoryImpl#comparator}} this placeholder does not seem to be > used and this code can be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20358) Make distributed node storage config local
[ https://issues.apache.org/jira/browse/IGNITE-20358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-20358: --- Fix Version/s: 3.0.0-beta2 Reviewer: Ivan Bessonov > Make distributed node storage config local > -- > > Key: IGNITE-20358 > URL: https://issues.apache.org/jira/browse/IGNITE-20358 > Project: Ignite > Issue Type: Task >Reporter: Kirill Gusakov >Assignee: Kirill Gusakov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 50m > Remaining Estimate: 0h > > *Motivation* > At the moment, all {{*StorageEngineConfigurationSchema}} has the > {{ConfigurationType.DISTRIBUTED}} type. But it is not the case anymore, each > node can have the different storage configurations by new design. > *Definition of done* > - All {{*StorageEngineConfigurationSchema}} configurations moved to the > {{ConfigurationType.LOCAL}} scope. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20519) Add causality token of the last update of catalog descriptors
Mirza Aliev created IGNITE-20519: Summary: Add causality token of the last update of catalog descriptors Key: IGNITE-20519 URL: https://issues.apache.org/jira/browse/IGNITE-20519 Project: Ignite Issue Type: Bug Reporter: Mirza Aliev -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20493) Ignite website shows downloading version 2.11 as latest version
[ https://issues.apache.org/jira/browse/IGNITE-20493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erlan Aytpaev updated IGNITE-20493: --- Component/s: website > Ignite website shows downloading version 2.11 as latest version > --- > > Key: IGNITE-20493 > URL: https://issues.apache.org/jira/browse/IGNITE-20493 > Project: Ignite > Issue Type: Task > Components: website >Reporter: Erlan Aytpaev >Assignee: Erlan Aytpaev >Priority: Major > > !https://lists.apache.org/api/email.lua?attachment=true=1c01nt4nol691fxz5k71zpwd5r60d0ql=08cc11e094cb73962012551428c510b4c62b6064ee6e2e07737241c552039874! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (IGNITE-17841) Update list of PMC members and Committers
[ https://issues.apache.org/jira/browse/IGNITE-17841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erlan Aytpaev closed IGNITE-17841. -- > Update list of PMC members and Committers > - > > Key: IGNITE-17841 > URL: https://issues.apache.org/jira/browse/IGNITE-17841 > Project: Ignite > Issue Type: Task > Components: website >Reporter: Kseniya Romanova >Assignee: Erlan Aytpaev >Priority: Trivial > > Please add to the page > [https://ignite.apache.org/our-community.html#community] > > 1. new PMC member: > Ivan Daschinsky [https://whimsy.apache.org/roster/committer/ivandasch] > [https://github.com/ivandasch] > > 2. new Committers: > Kirill Tkalenko [https://whimsy.apache.org/roster/committer/tkalkirill] > [https://github.com/tkalkirill] > Mikhail Petrov [https://whimsy.apache.org/roster/committer/mpetrov] > > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20492) NPE in PartitionReplicaListener's primary replica retrieval
[ https://issues.apache.org/jira/browse/IGNITE-20492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-20492: - Ignite Flags: (was: Docs Required,Release Notes Required) > NPE in PartitionReplicaListener's primary replica retrieval > --- > > Key: IGNITE-20492 > URL: https://issues.apache.org/jira/browse/IGNITE-20492 > Project: Ignite > Issue Type: Bug >Reporter: Kirill Sizov >Priority: Major > Labels: ignite-3 > > PartitionReplicaListener.ensureReplicaIsPrimary has the following block of > code > {code:java} > if (expectedTerm != null) { > return placementDriver.getPrimaryReplica(replicationGroupId, now) > .thenCompose(primaryReplica -> { > long currentEnlistmentConsistencyToken = > primaryReplica.getStartTime().longValue(); > {code} > However, according to the placementDriver's contract, {{getPrimaryReplica}} > can complete with null: > {quote} > Same as awaitPrimaryReplica(ReplicationGroupId, HybridTimestamp) despite the > fact that given method await logic is bounded. It will wait for a primary > replica for a reasonable period of time, and complete a future with null if a > matching lease isn't found. Generally speaking reasonable here means enough > for distribution across cluster nodes. > {quote} > In that case ensureReplicaIsPrimary will crash with NPE: > {noformat} > ... 3 more > Caused by: java.lang.NullPointerException > at > org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$ensureReplicaIsPrimary$155(PartitionReplicaListener.java:2397) > ~[ignite-table-3.0.0-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.lambda$completeWaitersOnUpdate$0(PendingComparableValuesTracker.java:169) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at java.util.concurrent.ConcurrentMap.forEach(ConcurrentMap.java:122) ~[?:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.completeWaitersOnUpdate(PendingComparableValuesTracker.java:169) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.util.PendingComparableValuesTracker.update(PendingComparableValuesTracker.java:103) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.time.ClusterTimeImpl.updateSafeTime(ClusterTimeImpl.java:146) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.onSafeTimeAdvanced(MetaStorageManagerImpl.java:849) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$1.onSafeTimeAdvanced(MetaStorageManagerImpl.java:456) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.WatchProcessor.invokeOnRevisionCallback(WatchProcessor.java:247) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$2(WatchProcessor.java:148) > ~[ignite-metastorage-3.0.0-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) > ~[?:?] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20418) Command 'indexes_force_rebuild' should work with several certain nodes.
[ https://issues.apache.org/jira/browse/IGNITE-20418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770346#comment-17770346 ] Ignite TC Bot commented on IGNITE-20418: {panel:title=Branch: [pull/10941/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10941/head] Base: [master] : New Tests (10)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}Control Utility 2{color} [[tests 10|https://ci2.ignite.apache.org/viewLog.html?buildId=7353263]] * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testWithNodeFilter[cmdHnd=jmx] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testIndexRebuildAllNodes[cmdHnd=cli] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testWithNodeFilter[cmdHnd=cli] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testIndexRebuildAllNodes[cmdHnd=jmx] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testIndexRebuildOutputTwoNodes[cmdHnd=cli] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testEmptyResultTwoNodes[cmdHnd=cli] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testInvalidArgumentGroups[cmdHnd=cli] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testInvalidArgumentGroups[cmdHnd=jmx] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testIndexRebuildOutputTwoNodes[cmdHnd=jmx] - PASSED{color} * {color:#013220}IgniteControlUtilityTestSuite2: GridCommandHandlerIndexForceRebuildTest.testEmptyResultTwoNodes[cmdHnd=jmx] - PASSED{color} {panel} [TeamCity *-- Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7353267buildTypeId=IgniteTests24Java8_RunAll] > Command 'indexes_force_rebuild' should work with several certain nodes. > --- > > Key: IGNITE-20418 > URL: https://issues.apache.org/jira/browse/IGNITE-20418 > Project: Ignite > Issue Type: Task >Reporter: Vladimir Steshin >Assignee: Vladimir Steshin >Priority: Minor > Labels: ise > > Currently, control.sh's command 'indexes_force_rebuild' has no ablity to > lauch index rebuild on several certain nodes. Only one node is accepted as > command parameter (--node-id). It would be handy to pass several nodes to > execute on like '--nodes'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20435) Preserve key order in InternalTableImpl#deleteAll
[ https://issues.apache.org/jira/browse/IGNITE-20435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin reassigned IGNITE-20435: Assignee: Vladislav Pyatkov > Preserve key order in InternalTableImpl#deleteAll > - > > Key: IGNITE-20435 > URL: https://issues.apache.org/jira/browse/IGNITE-20435 > Project: Ignite > Issue Type: Bug >Reporter: Igor Sapego >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > > The IGNITE-16004 fixed ordering for the most multi key methods but not for > the removeAll methods. > For example, removeAll(1, 2, 3) should return 1, 3 if a value for 1 and 3 > doesn't exists, but in practice this order may be broken. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20425) Corrupted Raft FSM state after restart
[ https://issues.apache.org/jira/browse/IGNITE-20425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-20425: - Description: According to the protocol, there are several numeric indexes in the Log / FSM: * {{lastLogIndex}} - index of the last logged log entry. * {{committedIndex}} - index of last committed log entry. {{{}committedIndex <= lastLogIndex{}}}. * {{appliedIndex}} - index of last log entry, processed by the state machine. {{appliedIndex <= }}{{{}committedIndex{}}}. If committed index is less then last index, RAFT can invoke the "truncate suffix" procedure and delete uncommitted log's tail. This is a valid thing to do. Now, imagine the following scenario: * {{{}lastIndex == 12{}}}, {{committedIndex == 11}} * Node is restarted * Upon recovery, we replay the entire log. Now {{appliedIndex == 12}} * After recovery, we join the group and receive "truncate suffix command" in order to deleted uncommitted entries. * We must delete entry 12, but it's already applied. Peer is broken. The reason is that we don't use default recovery procedure: {{org.apache.ignite.raft.jraft.core.NodeImpl#init}} Canonical raft doesn't replay log before join is complete. Down to earth scenario, that shows this situation in practice: * Start group with 3 nodes: A, B, and C. * We assume that A is a leader. * Shutdown A, leader re-election is triggered. * We assume that B votes for C. * C receives grant from B and proceeds writing new configuration into local log. * Shutdown B before it writes the same log entry (easily-reproducible race). * Shutdown C. * Restart cluster. Resulting states: A - [1: initial cfg] B - [1: initial cfg] C - [1: initial cfg, 2: re-election] h3. How to fix option a. Recover log after join. This is not optimal, it's like performing local recovery after cluster activation in Ignite 2. We fixed that behavior long time ago. option b. Somehow track committed index and perform partial recovery, that guarantees safety. We could write committed index into log storage periodically. "b" is better, but maybe there are other ways as well. h3. Upd #1 Highly likely we just can remove all that await log replay code on raft node start just because it’s no longer needed. Eventually it was introduced in order to enable primary replica direct storage reads, which is now covered properly within {code:java} /** * Tries to read index from group leader and wait for this index to appear in local storage. Can possible return failed future with * timeout exception, and in this case, replica would not answer to placement driver, because the response is useless. Placement driver * should handle this. * * @param expirationTime Lease expiration time. * @return Future that is completed when local storage catches up the index that is actual for leader on the moment of request. */ private CompletableFuture waitForActualState(long expirationTime) { LOG.info("Waiting for actual storage state, group=" + groupId()); long timeout = expirationTime - currentTimeMillis(); if (timeout <= 0) { return failedFuture(new TimeoutException()); } return retryOperationUntilSuccess(raftClient::readIndex, e -> currentTimeMillis() > expirationTime, executor) .orTimeout(timeout, TimeUnit.MILLISECONDS) .thenCompose(storageIndexTracker::waitFor); }{code} similar is about RO access, we await the safeTime that has HB relations with corresponding storage updates. was: According to the protocol, there are several numeric indexes in the Log / FSM: * {{lastLogIndex}} - index of the last logged log entry. * {{committedIndex}} - index of last committed log entry. {{{}committedIndex <= lastLogIndex{}}}. * {{appliedIndex}} - index of last log entry, processed by the state machine. {{appliedIndex <= }}{{{}committedIndex{}}}. If committed index is less then last index, RAFT can invoke the "truncate suffix" procedure and delete uncommitted log's tail. This is a valid thing to do. Now, imagine the following scenario: * {{{}lastIndex == 12{}}}, {{committedIndex == 11}} * Node is restarted * Upon recovery, we replay the entire log. Now {{appliedIndex == 12}} * After recovery, we join the group and receive "truncate suffix command" in order to deleted uncommitted entries. * We must delete entry 12, but it's already applied. Peer is broken. The reason is that we don't use default recovery procedure: {{org.apache.ignite.raft.jraft.core.NodeImpl#init}} Canonical raft doesn't replay log before join is complete. Down to earth scenario, that shows this situation in practice: * Start group with 3 nodes: A, B, and C. * We assume that A is a leader. * Shutdown A, leader re-election is triggered. * We assume that B votes for C. * C receives grant from B and proceeds writing new configuration into local log. * Shutdown B before it writes the same log
[jira] [Updated] (IGNITE-20425) Corrupted Raft FSM state after restart
[ https://issues.apache.org/jira/browse/IGNITE-20425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-20425: - Description: According to the protocol, there are several numeric indexes in the Log / FSM: * {{lastLogIndex}} - index of the last logged log entry. * {{committedIndex}} - index of last committed log entry. {{{}committedIndex <= lastLogIndex{}}}. * {{appliedIndex}} - index of last log entry, processed by the state machine. {{appliedIndex <= }}{{{}committedIndex{}}}. If committed index is less then last index, RAFT can invoke the "truncate suffix" procedure and delete uncommitted log's tail. This is a valid thing to do. Now, imagine the following scenario: * {{{}lastIndex == 12{}}}, {{committedIndex == 11}} * Node is restarted * Upon recovery, we replay the entire log. Now {{appliedIndex == 12}} * After recovery, we join the group and receive "truncate suffix command" in order to deleted uncommitted entries. * We must delete entry 12, but it's already applied. Peer is broken. The reason is that we don't use default recovery procedure: {{org.apache.ignite.raft.jraft.core.NodeImpl#init}} Canonical raft doesn't replay log before join is complete. Down to earth scenario, that shows this situation in practice: * Start group with 3 nodes: A, B, and C. * We assume that A is a leader. * Shutdown A, leader re-election is triggered. * We assume that B votes for C. * C receives grant from B and proceeds writing new configuration into local log. * Shutdown B before it writes the same log entry (easily-reproducible race). * Shutdown C. * Restart cluster. Resulting states: A - [1: initial cfg] B - [1: initial cfg] C - [1: initial cfg, 2: re-election] h3. How to fix option a. Recover log after join. This is not optimal, it's like performing local recovery after cluster activation in Ignite 2. We fixed that behavior long time ago. option b. Somehow track committed index and perform partial recovery, that guarantees safety. We could write committed index into log storage periodically. "b" is better, but maybe there are other ways as well. h3. Upd #1 Highly likely we just can remove all that await log replay code on raft node start just because it’s no longer needed. Eventually it was introduced in order to enable primary replica direct storage reads, which is now covered properly within {{}} {code:java} /** * Tries to read index from group leader and wait for this index to appear in local storage. Can possible return failed future with * timeout exception, and in this case, replica would not answer to placement driver, because the response is useless. Placement driver * should handle this. * * @param expirationTime Lease expiration time. * @return Future that is completed when local storage catches up the index that is actual for leader on the moment of request. */ private CompletableFuture waitForActualState(long expirationTime) { LOG.info("Waiting for actual storage state, group=" + groupId()); long timeout = expirationTime - currentTimeMillis(); if (timeout <= 0) { return failedFuture(new TimeoutException()); } return retryOperationUntilSuccess(raftClient::readIndex, e -> currentTimeMillis() > expirationTime, executor) .orTimeout(timeout, TimeUnit.MILLISECONDS) .thenCompose(storageIndexTracker::waitFor); } {code} {{}} similar is about RO access, we await the safeTime that has HB relations with corresponding storage update. was: According to the protocol, there are several numeric indexes in the Log / FSM: * {{lastLogIndex}} - index of the last logged log entry. * {{committedIndex}} - index of last committed log entry. {{{}committedIndex <= lastLogIndex{}}}. * {{appliedIndex}} - index of last log entry, processed by the state machine. {{appliedIndex <= }}{{{}committedIndex{}}}. If committed index is less then last index, RAFT can invoke the "truncate suffix" procedure and delete uncommitted log's tail. This is a valid thing to do. Now, imagine the following scenario: * {{{}lastIndex == 12{}}}, {{committedIndex == 11}} * Node is restarted * Upon recovery, we replay the entire log. Now {{appliedIndex == 12}} * After recovery, we join the group and receive "truncate suffix command" in order to deleted uncommitted entries. * We must delete entry 12, but it's already applied. Peer is broken. The reason is that we don't use default recovery procedure: {{org.apache.ignite.raft.jraft.core.NodeImpl#init}} Canonical raft doesn't replay log before join is complete. Down to earth scenario, that shows this situation in practice: * Start group with 3 nodes: A, B, and C. * We assume that A is a leader. * Shutdown A, leader re-election is triggered. * We assume that B votes for C. * C receives grant from B and proceeds writing new configuration into local log. * Shutdown B before it writes
[jira] [Updated] (IGNITE-20425) Corrupted Raft FSM state after restart
[ https://issues.apache.org/jira/browse/IGNITE-20425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-20425: - Description: According to the protocol, there are several numeric indexes in the Log / FSM: * {{lastLogIndex}} - index of the last logged log entry. * {{committedIndex}} - index of last committed log entry. {{{}committedIndex <= lastLogIndex{}}}. * {{appliedIndex}} - index of last log entry, processed by the state machine. {{appliedIndex <= }}{{{}committedIndex{}}}. If committed index is less then last index, RAFT can invoke the "truncate suffix" procedure and delete uncommitted log's tail. This is a valid thing to do. Now, imagine the following scenario: * {{{}lastIndex == 12{}}}, {{committedIndex == 11}} * Node is restarted * Upon recovery, we replay the entire log. Now {{appliedIndex == 12}} * After recovery, we join the group and receive "truncate suffix command" in order to deleted uncommitted entries. * We must delete entry 12, but it's already applied. Peer is broken. The reason is that we don't use default recovery procedure: {{org.apache.ignite.raft.jraft.core.NodeImpl#init}} Canonical raft doesn't replay log before join is complete. Down to earth scenario, that shows this situation in practice: * Start group with 3 nodes: A, B, and C. * We assume that A is a leader. * Shutdown A, leader re-election is triggered. * We assume that B votes for C. * C receives grant from B and proceeds writing new configuration into local log. * Shutdown B before it writes the same log entry (easily-reproducible race). * Shutdown C. * Restart cluster. Resulting states: A - [1: initial cfg] B - [1: initial cfg] C - [1: initial cfg, 2: re-election] h3. How to fix option a. Recover log after join. This is not optimal, it's like performing local recovery after cluster activation in Ignite 2. We fixed that behavior long time ago. option b. Somehow track committed index and perform partial recovery, that guarantees safety. We could write committed index into log storage periodically. "b" is better, but maybe there are other ways as well. h3. Upd #1 Highly likely we just can remove all that await log replay code on raft node start just because it’s no longer needed. Eventually it was introduced in order to enable primary replica direct storage reads, which is now covered properly within {code:java} /** * Tries to read index from group leader and wait for this index to appear in local storage. Can possible return failed future with * timeout exception, and in this case, replica would not answer to placement driver, because the response is useless. Placement driver * should handle this. * * @param expirationTime Lease expiration time. * @return Future that is completed when local storage catches up the index that is actual for leader on the moment of request. */ private CompletableFuture waitForActualState(long expirationTime) { LOG.info("Waiting for actual storage state, group=" + groupId()); long timeout = expirationTime - currentTimeMillis(); if (timeout <= 0) { return failedFuture(new TimeoutException()); } return retryOperationUntilSuccess(raftClient::readIndex, e -> currentTimeMillis() > expirationTime, executor) .orTimeout(timeout, TimeUnit.MILLISECONDS) .thenCompose(storageIndexTracker::waitFor); }{code} similar is about RO access, we await the safeTime that has HB relations with corresponding storage update. was: According to the protocol, there are several numeric indexes in the Log / FSM: * {{lastLogIndex}} - index of the last logged log entry. * {{committedIndex}} - index of last committed log entry. {{{}committedIndex <= lastLogIndex{}}}. * {{appliedIndex}} - index of last log entry, processed by the state machine. {{appliedIndex <= }}{{{}committedIndex{}}}. If committed index is less then last index, RAFT can invoke the "truncate suffix" procedure and delete uncommitted log's tail. This is a valid thing to do. Now, imagine the following scenario: * {{{}lastIndex == 12{}}}, {{committedIndex == 11}} * Node is restarted * Upon recovery, we replay the entire log. Now {{appliedIndex == 12}} * After recovery, we join the group and receive "truncate suffix command" in order to deleted uncommitted entries. * We must delete entry 12, but it's already applied. Peer is broken. The reason is that we don't use default recovery procedure: {{org.apache.ignite.raft.jraft.core.NodeImpl#init}} Canonical raft doesn't replay log before join is complete. Down to earth scenario, that shows this situation in practice: * Start group with 3 nodes: A, B, and C. * We assume that A is a leader. * Shutdown A, leader re-election is triggered. * We assume that B votes for C. * C receives grant from B and proceeds writing new configuration into local log. * Shutdown B before it writes the same log
[jira] [Commented] (IGNITE-20470) Ducktape to check dump performance
[ https://issues.apache.org/jira/browse/IGNITE-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770340#comment-17770340 ] Nikolay Izhikov commented on IGNITE-20470: -- https://github.com/apache/ignite/pull/10953 - cache_dumps is a feature branch to add test > Ducktape to check dump performance > -- > > Key: IGNITE-20470 > URL: https://issues.apache.org/jira/browse/IGNITE-20470 > Project: Ignite > Issue Type: Task >Reporter: Nikolay Izhikov >Priority: Major > Labels: IEP-109, ise > > Dump creation can affect transactions performance with change listener and > disc operations. We must create ducktape test to check this. > Example test scenario: > * Start nodes > * Start transaction operations: insert, update, remove. > * Create dump > * Check dump consistency. > Measure > * Transaction performance penalty. > * GC utilization. > * Disc utilization. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20356) Sql. Rework RowHandler "set" operation.
[ https://issues.apache.org/jira/browse/IGNITE-20356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin reassigned IGNITE-20356: - Assignee: (was: Pavel Pereslegin) > Sql. Rework RowHandler "set" operation. > --- > > Key: IGNITE-20356 > URL: https://issues.apache.org/jira/browse/IGNITE-20356 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > In IGNITE-19791, a wrapper over {{BinaryTuple}} was added. > This wrapper ({{BinaryTupleRowWrapper}}) does not support the "{{set()}}" > method > Instead of using {{set(int, RowT, Object)}} method, we can use the > {{append(RowT, Object)}} method to add field values sequentially. > We need: > * Add a new RowFactory method that will return a builder that allows you to > append values to row and build the row. > * Remove the {{RowHandler#set()}} method and rework all related code/tests > to use the builder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20518) Use CatalogService in JdbcMetadataCatalog
Roman Puchkovskiy created IGNITE-20518: -- Summary: Use CatalogService in JdbcMetadataCatalog Key: IGNITE-20518 URL: https://issues.apache.org/jira/browse/IGNITE-20518 Project: Ignite Issue Type: Improvement Reporter: Roman Puchkovskiy Assignee: Roman Puchkovskiy Fix For: 3.0.0-beta2 Currently, {{JdbcMetadataCatalog}} uses {{TableManager}} to get tables' metadata. It is enough to use CatalogService; it is also more suitable as it allows to get a consistent snapshot thanks to timestamps support. -- This message was sent by Atlassian Jira (v8.20.10#820010)