[jira] [Updated] (IGNITE-20916) API future of table creation may be completed incorrectly
[ https://issues.apache.org/jira/browse/IGNITE-20916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Chudov updated IGNITE-20916: -- Description: After some changes, the map TableManager#tableCreateFuts is used in single place only: TableManager#completeApiCreateFuture which makes no sense. It should be either removed or (more probably) we have some races on table creation from user API's point of view. Basically, there should be a rework of partition start and moving responsibility for partitions from tables to zones, and the table start process will have to look completely different and it will be split between different meta storage revision (unlike now: it happens within single revision updates). was:After some changes, the map TableManager#tableCreateFuts is used in single place only: TableManager#completeApiCreateFuture which makes no sense. It should be either removed or (more probably) we have some races on table creation from user API's point of view. > API future of table creation may be completed incorrectly > - > > Key: IGNITE-20916 > URL: https://issues.apache.org/jira/browse/IGNITE-20916 > Project: Ignite > Issue Type: Bug >Reporter: Denis Chudov >Priority: Major > Labels: ignite-3 > > After some changes, the map TableManager#tableCreateFuts is used in single > place only: TableManager#completeApiCreateFuture which makes no sense. It > should be either removed or (more probably) we have some races on table > creation from user API's point of view. > Basically, there should be a rework of partition start and moving > responsibility for partitions from tables to zones, and the table start > process will have to look completely different and it will be split between > different meta storage revision (unlike now: it happens within single > revision updates). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20935) Remove MetaStorageManagerImpl#getService method
[ https://issues.apache.org/jira/browse/IGNITE-20935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20935: - Fix Version/s: 3.0.0-beta2 > Remove MetaStorageManagerImpl#getService method > --- > > Key: IGNITE-20935 > URL: https://issues.apache.org/jira/browse/IGNITE-20935 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > It is duplicated by the {{MetaStorageManagerImpl#metaStorageServiceFuture}} > method. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20938) Extract SQL tests from runner module to ignite-sql-engine
[ https://issues.apache.org/jira/browse/IGNITE-20938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20938: --- Description: Runner module incorporate big amount of tests related to another modules. That's lead to long running time for the suite on TC. Currently, intergation tests for SQL functionallity located in ignite-runner module. So, need to extract it to ignite-sql-engine module via runner test-fixtures support to decrease execution time of test for runner module. As reference for such activities could be used IGNITE-20670 was: Runner module incorporate big amount of tests related to another modules. That's lead to long running time for the suite on TC. Currently, intergation tests for SQL functionallity located in ignite-runner module. So, need to extract it to ignite-sql-engine module via runner test-fixtures support to decrease execution time of test for runner module. > Extract SQL tests from runner module to ignite-sql-engine > - > > Key: IGNITE-20938 > URL: https://issues.apache.org/jira/browse/IGNITE-20938 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Runner module incorporate big amount of tests related to another modules. > That's lead to long running time for the suite on TC. > Currently, intergation tests for SQL functionallity located in ignite-runner > module. So, need to extract it to ignite-sql-engine module via runner > test-fixtures support to decrease execution time of test for runner module. > As reference for such activities could be used IGNITE-20670 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20937) Extract JDBC tests from runner module to ignite-jdbc
[ https://issues.apache.org/jira/browse/IGNITE-20937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20937: --- Description: Runner module incorporate big amount of tests related to another modules. That's lead to long running time for the suite on TC. Currently, intergation tests for JDBC functionallity located in ignite-runner module. So, need to extract it to JDBC module via runner test-fixtures support to decrease execution time of test for runner module. As reference for such activities could be used IGNITE-20670 was: Runner module incorporate big amount of tests related to another modules. That's lead to long running time for the suite on TC. Currently, intergation tests for JDBC functionallity located in ignite-runner module. So, need to extract it to JDBC module via runner test-fixtures support to decrease execution time of test for runner module > Extract JDBC tests from runner module to ignite-jdbc > > > Key: IGNITE-20937 > URL: https://issues.apache.org/jira/browse/IGNITE-20937 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Runner module incorporate big amount of tests related to another modules. > That's lead to long running time for the suite on TC. > Currently, intergation tests for JDBC functionallity located in ignite-runner > module. So, need to extract it to JDBC module via runner test-fixtures > support to decrease execution time of test for runner module. > As reference for such activities could be used IGNITE-20670 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20938) Extract SQL tests from runner module to ignite-sql-engine
Yury Gerzhedovich created IGNITE-20938: -- Summary: Extract SQL tests from runner module to ignite-sql-engine Key: IGNITE-20938 URL: https://issues.apache.org/jira/browse/IGNITE-20938 Project: Ignite Issue Type: Improvement Components: sql Reporter: Yury Gerzhedovich Runner module incorporate big amount of tests related to another modules. That's lead to long running time for the suite on TC. Currently, intergation tests for SQL functionallity located in ignite-runner module. So, need to extract it to ignite-sql-engine module via runner test-fixtures support to decrease execution time of test for runner module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20937) Extract JDBC tests from runner module to ignite-jdbc
Yury Gerzhedovich created IGNITE-20937: -- Summary: Extract JDBC tests from runner module to ignite-jdbc Key: IGNITE-20937 URL: https://issues.apache.org/jira/browse/IGNITE-20937 Project: Ignite Issue Type: Improvement Components: sql Reporter: Yury Gerzhedovich Runner module incorporate big amount of tests related to another modules. That's lead to long running time for the suite on TC. Currently, intergation tests for JDBC functionallity located in ignite-runner module. So, need to extract it to JDBC module via runner test-fixtures support to decrease execution time of test for runner module -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20863) Selecting indexes when performing update operations for partition
[ https://issues.apache.org/jira/browse/IGNITE-20863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789002#comment-17789002 ] Roman Puchkovskiy commented on IGNITE-20863: The patch looks good to me > Selecting indexes when performing update operations for partition > - > > Key: IGNITE-20863 > URL: https://issues.apache.org/jira/browse/IGNITE-20863 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > To implement IGNITE-20125, we need to correctly select the indexes into which > we will insert data during RW transactions update operations for partitions. > It is enough for us to collect all available and registered indexes at the > time of the operation, as well as all dropped available indexes that we can > know about. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20936) Sql. Introduce container for dynamic parameters.
[ https://issues.apache.org/jira/browse/IGNITE-20936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Zhuravkov updated IGNITE-20936: -- Labels: ignite-3 (was: ) > Sql. Introduce container for dynamic parameters. > > > Key: IGNITE-20936 > URL: https://issues.apache.org/jira/browse/IGNITE-20936 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Maksim Zhuravkov >Priority: Minor > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > At the moment sql validator uses an array to store values of dynamic > parameters. > Let's an array of dynamic parameters with a containter to make to possible to > leave some parameters unspecified. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20936) Sql. Introduce container for dynamic parameters.
[ https://issues.apache.org/jira/browse/IGNITE-20936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maksim Zhuravkov reassigned IGNITE-20936: - Assignee: Maksim Zhuravkov > Sql. Introduce container for dynamic parameters. > > > Key: IGNITE-20936 > URL: https://issues.apache.org/jira/browse/IGNITE-20936 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Maksim Zhuravkov >Assignee: Maksim Zhuravkov >Priority: Minor > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > At the moment sql validator uses an array to store values of dynamic > parameters. > Let's an array of dynamic parameters with a containter to make to possible to > leave some parameters unspecified. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20936) Sql. Introduce container for dynamic parameters.
Maksim Zhuravkov created IGNITE-20936: - Summary: Sql. Introduce container for dynamic parameters. Key: IGNITE-20936 URL: https://issues.apache.org/jira/browse/IGNITE-20936 Project: Ignite Issue Type: Improvement Components: sql Reporter: Maksim Zhuravkov Fix For: 3.0.0-beta2 At the moment sql validator uses an array to store values of dynamic parameters. Let's an array of dynamic parameters with a containter to make to possible to leave some parameters unspecified. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-19905) Race between stable assignments application and table removal
[ https://issues.apache.org/jira/browse/IGNITE-19905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788939#comment-17788939 ] Kirill Gusakov commented on IGNITE-19905: - Actually we have no better way then the simple direct check for null. Moreover, other cleanup operations: - ReplicaManager.stopReplica - Loza.stopRaftNodes is ready for the stop of non-existent entity. So, under this ticket we want just to clean the TODOs from in the source code. > Race between stable assignments application and table removal > - > > Key: IGNITE-19905 > URL: https://issues.apache.org/jira/browse/IGNITE-19905 > Project: Ignite > Issue Type: Bug >Reporter: Roman Puchkovskiy >Assignee: Kirill Gusakov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > *Motivation* > It is possible that handleChangeStableAssignmentEvent() gets an event such > that for the corresponding revision the table is already removed, so an NPE > happens when trying to destroy a partition storage (at the moment it handled > by workaround IGNITE-19906). > *Definition of done* > - The logic about partition raft group and storage cleanup is not event > executed, if the table is not exist already. > *Implementation notes* > - We must check if table still exists at the current revision firstly and > only then execute the cleanup logic -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20737) Idle Verify fails with "Cluster not idle" error when log level is set to DEBUG
[ https://issues.apache.org/jira/browse/IGNITE-20737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788865#comment-17788865 ] Ignite TC Bot commented on IGNITE-20737: {panel:title=Branch: [pull/11052/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/11052/head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *-- Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7617485buildTypeId=IgniteTests24Java8_RunAll] > Idle Verify fails with "Cluster not idle" error when log level is set to DEBUG > -- > > Key: IGNITE-20737 > URL: https://issues.apache.org/jira/browse/IGNITE-20737 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.15 >Reporter: Valery Shorin >Assignee: Egor Fomin >Priority: Major > Labels: ise > Time Spent: 0.5h > Remaining Estimate: 0h > > In case if idle verify executed with DEBUG log level it fails with error: > > {code:java} > Exception: org.apache.ignite.IgniteException Cluster not idle. Modifications > found in caches or groups: [grpName=default, grpId=1544803905, partId=30] > changed during size calculation [updCntrBefore=Counter [init=0, val=1], > updCntrAfter=Counter [init=0, val=1]] > {code} > > This issue can be reproduced by the following test (shoud be added to > \{{GridCommandHandlerTest}}: > > {code:java} > @Test > public void testCacheIdleVerifyLogLevelDebug() throws Exception { > IgniteEx ignite = startGrids(3); > ignite.cluster().state(ACTIVE); > IgniteCache cache = ignite.createCache(new > CacheConfiguration<>(DEFAULT_CACHE_NAME) > .setAffinity(new RendezvousAffinityFunction(false, 32)) > .setBackups(1)); > cache.put("key", "value"); > injectTestSystemOut(); > setLoggerDebugLevel(); > assertEquals(EXIT_CODE_OK, execute("--cache", "idle_verify")); > assertContains(log, testOut.toString(), "no conflicts have been found"); > } {code} > > > The reason of this failure - {{equals(}} method is not defined for > {{PartitionUpdateCounterDebugWrapper}} class -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20935) Remove MetaStorageManagerImpl#getService method
[ https://issues.apache.org/jira/browse/IGNITE-20935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20935: - Description: It is duplicated by the {{metaStorageServiceFuture > Remove MetaStorageManagerImpl#getService method > --- > > Key: IGNITE-20935 > URL: https://issues.apache.org/jira/browse/IGNITE-20935 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > > It is duplicated by the {{metaStorageServiceFuture -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20935) Remove MetaStorageManagerImpl#getService method
[ https://issues.apache.org/jira/browse/IGNITE-20935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20935: - Description: It is duplicated by the {{MetaStorageManagerImpl#metaStorageServiceFuture}} method. (was: It is duplicated by the {{metaStorageServiceFuture) > Remove MetaStorageManagerImpl#getService method > --- > > Key: IGNITE-20935 > URL: https://issues.apache.org/jira/browse/IGNITE-20935 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > > It is duplicated by the {{MetaStorageManagerImpl#metaStorageServiceFuture}} > method. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20935) Remove MetaStorageManagerImpl#getService method
Aleksandr Polovtcev created IGNITE-20935: Summary: Remove MetaStorageManagerImpl#getService method Key: IGNITE-20935 URL: https://issues.apache.org/jira/browse/IGNITE-20935 Project: Ignite Issue Type: Task Reporter: Aleksandr Polovtcev Assignee: Aleksandr Polovtcev -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20934) Worry about deleted indexes that will no longer be used until they are destroyed
Kirill Tkalenko created IGNITE-20934: Summary: Worry about deleted indexes that will no longer be used until they are destroyed Key: IGNITE-20934 URL: https://issues.apache.org/jira/browse/IGNITE-20934 Project: Ignite Issue Type: Improvement Reporter: Kirill Tkalenko At the moment, when executing an RW transaction, we will write to existing (available and registered) indexes and to all dropped available indexes that were in the past. This is necessary so as not to abort an RW transactions that caught the index dropping event. It won’t be good if we write to old dropped available indexes for a very long time; we need to stop writing to them as quickly as possible. At the same time, long RW transactions that saw the dropping of the index should not be broken. We can get stuck on the mechanism of a low watermark, but if it is made infinite, it will bring us a lot of pain. You need to think carefully about solving the problem. I have the following thoughts. Locally, each node will monitor the progress of its an RW transactions and as soon as the last RW transaction that caught the index dropping event disappears, immediately remove it from the selection for new RW transactions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20933) Remove InternalTableImpl.assignments
[ https://issues.apache.org/jira/browse/IGNITE-20933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-20933: Description: After IGNITE-20701 and IGNITE-19619 we no longer need *InternalTableImpl.assignments*. Remove it. (was: After IGNITE-20701 and IGNITE-19619 we no longer need ) > Remove InternalTableImpl.assignments > > > Key: IGNITE-20933 > URL: https://issues.apache.org/jira/browse/IGNITE-20933 > Project: Ignite > Issue Type: Improvement >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > After IGNITE-20701 and IGNITE-19619 we no longer need > *InternalTableImpl.assignments*. Remove it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20933) Remove InternalTableImpl.assignments
[ https://issues.apache.org/jira/browse/IGNITE-20933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-20933: Description: After IGNITE-20701 and IGNITE-19619 we no longer need > Remove InternalTableImpl.assignments > > > Key: IGNITE-20933 > URL: https://issues.apache.org/jira/browse/IGNITE-20933 > Project: Ignite > Issue Type: Improvement >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > After IGNITE-20701 and IGNITE-19619 we no longer need -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20933) Remove InternalTableImpl.assignments
Pavel Tupitsyn created IGNITE-20933: --- Summary: Remove InternalTableImpl.assignments Key: IGNITE-20933 URL: https://issues.apache.org/jira/browse/IGNITE-20933 Project: Ignite Issue Type: Improvement Affects Versions: 3.0.0-beta1 Reporter: Pavel Tupitsyn Fix For: 3.0.0-beta2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20910) Add a test for inserting data after restart
[ https://issues.apache.org/jira/browse/IGNITE-20910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20910: - Fix Version/s: 3.0.0-beta2 > Add a test for inserting data after restart > --- > > Key: IGNITE-20910 > URL: https://issues.apache.org/jira/browse/IGNITE-20910 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 50m > Remaining Estimate: 0h > > When working on a different task, I discovered that inserting around 2000 > tuples into a single Ignite node doesn't work, if the node has been restarted > before. > This ticket is only about adding a test that reproduces this issue, fixes > should be implemented as part of other tickets. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20932) Update Linux packages version to 2.16
Nikita Amelchev created IGNITE-20932: Summary: Update Linux packages version to 2.16 Key: IGNITE-20932 URL: https://issues.apache.org/jira/browse/IGNITE-20932 Project: Ignite Issue Type: Sub-task Reporter: Nikita Amelchev Assignee: Nikita Amelchev Update Linux packages version to 2.16 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20928) Update ignite-2.16 branch version to 2.16.0
[ https://issues.apache.org/jira/browse/IGNITE-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev updated IGNITE-20928: - Ignite Flags: (was: Docs Required,Release Notes Required) > Update ignite-2.16 branch version to 2.16.0 > --- > > Key: IGNITE-20928 > URL: https://issues.apache.org/jira/browse/IGNITE-20928 > Project: Ignite > Issue Type: Sub-task >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Update ignite-2.16 branch version to 2.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20910) Add a test for inserting data after restart
[ https://issues.apache.org/jira/browse/IGNITE-20910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788824#comment-17788824 ] Vladislav Pyatkov commented on IGNITE-20910: LGTM > Add a test for inserting data after restart > --- > > Key: IGNITE-20910 > URL: https://issues.apache.org/jira/browse/IGNITE-20910 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > Time Spent: 40m > Remaining Estimate: 0h > > When working on a different task, I discovered that inserting around 2000 > tuples into a single Ignite node doesn't work, if the node has been restarted > before. > This ticket is only about adding a test that reproduces this issue, fixes > should be implemented as part of other tickets. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-20928) Update ignite-2.16 branch version to 2.16.0
[ https://issues.apache.org/jira/browse/IGNITE-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev resolved IGNITE-20928. -- Resolution: Fixed Merged into the 2.16. [~PetrovMikhail], Thank you for the review. > Update ignite-2.16 branch version to 2.16.0 > --- > > Key: IGNITE-20928 > URL: https://issues.apache.org/jira/browse/IGNITE-20928 > Project: Ignite > Issue Type: Sub-task >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Update ignite-2.16 branch version to 2.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-20929) Update Ignite version to 2.17.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IGNITE-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788820#comment-17788820 ] Nikita Amelchev commented on IGNITE-20929: -- Merged into the master. [~PetrovMikhail], Thank you for the review. > Update Ignite version to 2.17.0-SNAPSHOT > > > Key: IGNITE-20929 > URL: https://issues.apache.org/jira/browse/IGNITE-20929 > Project: Ignite > Issue Type: Sub-task >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Update Ignite version to 2.17.0-SNAPSHOT -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-20930) Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IGNITE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev resolved IGNITE-20930. -- Resolution: Fixed Merged into the master. [~PetrovMikhail], Thank you for the review. > Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT > - > > Key: IGNITE-20930 > URL: https://issues.apache.org/jira/browse/IGNITE-20930 > Project: Ignite > Issue Type: Sub-task >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > > Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20930) Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IGNITE-20930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev updated IGNITE-20930: - Ignite Flags: (was: Docs Required,Release Notes Required) > Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT > - > > Key: IGNITE-20930 > URL: https://issues.apache.org/jira/browse/IGNITE-20930 > Project: Ignite > Issue Type: Sub-task >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > > Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20929) Update Ignite version to 2.17.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/IGNITE-20929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev updated IGNITE-20929: - Ignite Flags: (was: Docs Required,Release Notes Required) > Update Ignite version to 2.17.0-SNAPSHOT > > > Key: IGNITE-20929 > URL: https://issues.apache.org/jira/browse/IGNITE-20929 > Project: Ignite > Issue Type: Sub-task >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Update Ignite version to 2.17.0-SNAPSHOT -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20931) ignite-website: Add documentation, release notes and download links for 2.16.0 release
Nikita Amelchev created IGNITE-20931: Summary: ignite-website: Add documentation, release notes and download links for 2.16.0 release Key: IGNITE-20931 URL: https://issues.apache.org/jira/browse/IGNITE-20931 Project: Ignite Issue Type: Sub-task Reporter: Nikita Amelchev Assignee: Nikita Amelchev ignite-website: Add documentation, release notes and download links for 2.16.0 release -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20930) Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT
Nikita Amelchev created IGNITE-20930: Summary: Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT Key: IGNITE-20930 URL: https://issues.apache.org/jira/browse/IGNITE-20930 Project: Ignite Issue Type: Sub-task Reporter: Nikita Amelchev Assignee: Nikita Amelchev Ignite-extensions: Change version of Ignite dependency to 2.17.0-SNAPSHOT -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20929) Update Ignite version to 2.17.0-SNAPSHOT
Nikita Amelchev created IGNITE-20929: Summary: Update Ignite version to 2.17.0-SNAPSHOT Key: IGNITE-20929 URL: https://issues.apache.org/jira/browse/IGNITE-20929 Project: Ignite Issue Type: Sub-task Reporter: Nikita Amelchev Assignee: Nikita Amelchev Update Ignite version to 2.17.0-SNAPSHOT -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20927) Update IgniteReleasedVersion for compatibility tests to 2.16.0
Nikita Amelchev created IGNITE-20927: Summary: Update IgniteReleasedVersion for compatibility tests to 2.16.0 Key: IGNITE-20927 URL: https://issues.apache.org/jira/browse/IGNITE-20927 Project: Ignite Issue Type: Sub-task Reporter: Nikita Amelchev Assignee: Nikita Amelchev Update IgniteReleasedVersion for compatibility tests to 2.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20926) Update Apache Ignite 2.16 release notes
Nikita Amelchev created IGNITE-20926: Summary: Update Apache Ignite 2.16 release notes Key: IGNITE-20926 URL: https://issues.apache.org/jira/browse/IGNITE-20926 Project: Ignite Issue Type: Sub-task Reporter: Nikita Amelchev Assignee: Nikita Amelchev Update Apache Ignite 2.16 release notes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20928) Update ignite-2.16 branch version to 2.16.0
Nikita Amelchev created IGNITE-20928: Summary: Update ignite-2.16 branch version to 2.16.0 Key: IGNITE-20928 URL: https://issues.apache.org/jira/browse/IGNITE-20928 Project: Ignite Issue Type: Sub-task Reporter: Nikita Amelchev Assignee: Nikita Amelchev Update ignite-2.16 branch version to 2.16.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20925) Sql. Await primary replica only once per transaction, avoid of terms in enlist.
Evgeny Stanilovsky created IGNITE-20925: --- Summary: Sql. Await primary replica only once per transaction, avoid of terms in enlist. Key: IGNITE-20925 URL: https://issues.apache.org/jira/browse/IGNITE-20925 Project: Ignite Issue Type: Improvement Components: sql Affects Versions: 3.0.0-beta1 Reporter: Evgeny Stanilovsky After [1] for statements like : START TRANSACTION; INSERT INTO T1 VALUES(1); INSERT INTO T1 VALUES(2); COMMIT; we will call primary partition detection code SqlQueryProcessor#primaryReplicas as many times as insert operation is raised. It`s weird cause : 1. If transaction already enlisted into NODE1 for TABLE1_PARTITION1 in scope of this transaction it can`t be enlisted into different NODE2 for TABLE1_PARTITION1 due to lease expiration or node failure scenarios. 2. Additional useless time consumption due to code execution. Additionally seems we can call [2] for already awaited primary partitions. Thus proposal is: 1. Reduce calling primaryReplicas for 1 time per source and store this call usage only for further mapping planning. 2. Change semantic IgniteRelShuttle#enlist(int tableId, List assignments) for IgniteRelShuttle#enlist(int tableId, HybridTimestamp clock) seems no need to transfer assignments here they can be obtained from [2]. Also we can check tx.enlistedNodeAndTerm(tablePartId) and ReplicaMeta#getStartTime for term equality is not equal fast exception can be raised otherwise we need to wait such exception from tx erroneous enlisting logic. [1] https://issues.apache.org/jira/browse/IGNITE-19619 [2] PlacementDriver#getPrimaryReplica -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20896) Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness
[ https://issues.apache.org/jira/browse/IGNITE-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20896: --- Description: Test fails with an error: {code:java} java.lang.AssertionError: Expected :2 Actual :3 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) at org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) {code} We are failing because of incorrect result set of index query (extra test output added in PR [1]): {code} > [Entry [key=1, val=Person [id=0, secId=1, descId=1]], Entry [key=1, > val=Person [id=0, secId=1, descId=1]], Entry [key=3, val=Person [id=1, > secId=1, descId=1]]] {code} or {code} >> [Entry [key=3, val=Person [id=1, secId=1, descId=1]]] {code} # https://github.com/apache/ignite/pull/11062 was: Test fails with an error: {code:java} java.lang.AssertionError: Expected :2 Actual :3 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) at org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) {code} We are failing because of incorrect result set of index query (extra test output added in PR [1]): {code} > [Entry [key=1, val=Person [id=0, secId=1, descId=1]], Entry [key=1, > val=Person [id=0, secId=1, descId=1]], Entry [key=3, val=Person [id=1, > secId=1, descId=1]]] {code} or {code} {code} # https://github.com/apache/ignite/pull/11062 > Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness > -- > > Key: IGNITE-20896 > URL: https://issues.apache.org/jira/browse/IGNITE-20896 > Project: Ignite > Issue Type: Test >Reporter: Ilya Shishkov >Priority: Trivial > Labels: ise > Time Spent: 10m > Remaining Estimate: 0h > > Test fails with an error: > {code:java} > java.lang.AssertionError: > Expected :2 > Actual :3 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) > at > org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) > {code} > We are failing because of incorrect result set of index query (extra test > output added in PR [1]): > {code} > > [Entry [key=1, val=Person [id=0, secId=1, descId=1]], Entry [key=1, > > val=Person [id=0, secId=1, descId=1]], Entry [key=3, val=Person [id=1, > > secId=1, descId=1]]] > {code} > or > {code} > >> [Entry [key=3, val=Person [id=1, secId=1, descId=1]]] > {code} > # https://github.com/apache/ignite/pull/11062 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20896) Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness
[ https://issues.apache.org/jira/browse/IGNITE-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20896: --- Description: Test fails with an error: {code:java} java.lang.AssertionError: Expected :2 Actual :3 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) at org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) {code} We are failing because of incorrect result set of index query (extra test output added in PR [1]): {code} > [Entry [key=1, val=Person [id=0, secId=1, descId=1]], Entry [key=1, > val=Person [id=0, secId=1, descId=1]], Entry [key=3, val=Person [id=1, > secId=1, descId=1]]] {code} or {code} {code} # https://github.com/apache/ignite/pull/11062 was: Test fails with an error: {code} java.lang.AssertionError: Expected :2 Actual :3 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) at org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) {code} > Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness > -- > > Key: IGNITE-20896 > URL: https://issues.apache.org/jira/browse/IGNITE-20896 > Project: Ignite > Issue Type: Test >Reporter: Ilya Shishkov >Priority: Trivial > Labels: ise > Time Spent: 10m > Remaining Estimate: 0h > > Test fails with an error: > {code:java} > java.lang.AssertionError: > Expected :2 > Actual :3 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) > at > org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) > {code} > We are failing because of incorrect result set of index query (extra test > output added in PR [1]): > {code} > > [Entry [key=1, val=Person [id=0, secId=1, descId=1]], Entry [key=1, > > val=Person [id=0, secId=1, descId=1]], Entry [key=3, val=Person [id=1, > > secId=1, descId=1]]] > {code} > or > {code} > {code} > # https://github.com/apache/ignite/pull/11062 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20918) Leases expire after a node has been restarted
[ https://issues.apache.org/jira/browse/IGNITE-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev reassigned IGNITE-20918: Assignee: Vladislav Pyatkov (was: Alexander Lapin) > Leases expire after a node has been restarted > - > > Key: IGNITE-20918 > URL: https://issues.apache.org/jira/browse/IGNITE-20918 > Project: Ignite > Issue Type: Bug >Reporter: Aleksandr Polovtcev >Assignee: Vladislav Pyatkov >Priority: Critical > Labels: ignite-3 > Time Spent: 20m > Remaining Estimate: 0h > > IGNITE-20910 introduces a test that inserts some data after restarting a > node. For some reason, after some time, I can see the following messages in > the log: > {noformat} > [2023-11-22T10:00:17,056][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_19] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_0] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_9] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_10] > {noformat} > After that, the test fails with a {{PrimaryReplicaMissException}}. The > problem here, that it is expected that a single node should never have > expired leases, they should be prolongated automatically. I think that this > happens because the initial lease that was issued before the node was > restarted is still accepted by the node after restart. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20896) Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness
[ https://issues.apache.org/jira/browse/IGNITE-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20896: --- Description: Test fails with an error: {code} java.lang.AssertionError: Expected :2 Actual :3 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) at org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) {code} was: Test fails with an error: {code} java.lang.AssertionError: Expected :2 Actual :3 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) at org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) {code} > Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness > -- > > Key: IGNITE-20896 > URL: https://issues.apache.org/jira/browse/IGNITE-20896 > Project: Ignite > Issue Type: Test >Reporter: Ilya Shishkov >Priority: Trivial > Labels: ise > > Test fails with an error: > {code} > java.lang.AssertionError: > Expected :2 > Actual :3 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) > at > org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20896) Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness
[ https://issues.apache.org/jira/browse/IGNITE-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20896: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Fix MultifieldIndexQueryTest#testCheckBoundaries flakiness > -- > > Key: IGNITE-20896 > URL: https://issues.apache.org/jira/browse/IGNITE-20896 > Project: Ignite > Issue Type: Test >Reporter: Ilya Shishkov >Priority: Trivial > Labels: ise > > Test fails with an error: > {code} > java.lang.AssertionError: > Expected :2 > Actual :3 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.ignite.testframework.junits.JUnitAssertAware.assertEquals(JUnitAssertAware.java:95) > at > org.apache.ignite.cache.query.MultifieldIndexQueryTest.testCheckBoundaries(MultifieldIndexQueryTest.java:170) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20924) Add tests for parallel building of a new index and table updates
Kirill Tkalenko created IGNITE-20924: Summary: Add tests for parallel building of a new index and table updates Key: IGNITE-20924 URL: https://issues.apache.org/jira/browse/IGNITE-20924 Project: Ignite Issue Type: Improvement Reporter: Kirill Tkalenko Assignee: Kirill Tkalenko Fix For: 3.0.0-beta2 I discovered that we do not have tests when building an index, we make some changes to the table, such as insertions or updates, we need to add such integration tests. It would also be good to move the integration tests associated with building indexes into the index module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20923) Possible optimization for sending row IDs during index building
Kirill Tkalenko created IGNITE-20923: Summary: Possible optimization for sending row IDs during index building Key: IGNITE-20923 URL: https://issues.apache.org/jira/browse/IGNITE-20923 Project: Ignite Issue Type: Improvement Reporter: Kirill Tkalenko Now, when building an index, we send all row IDs from the row store, starting from the smallest to the end of the store, and we want to build indexes for all of them. I think that we do not need to send all row IDs, but only those in which the version chains have versions less than or equal to the time the index was created. Newer ones will be sent in the normal operation of the cluster in the replication protocol or full rebalancing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20863) Selecting indexes when performing update operations for partition
[ https://issues.apache.org/jira/browse/IGNITE-20863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-20863: - Description: To implement IGNITE-20125, we need to correctly select the indexes into which we will insert data during RW transactions update operations for partitions. It is enough for us to collect all available and registered indexes at the time of the operation, as well as all dropped available indexes that we can know about. was: To implement IGNITE-20125, we need to correctly select the indexes into which we will insert data during RW transactions update operations for partitions. What do we need for this: # Catalog version of the transaction by beginTs of the txId. # Catalog version of the update operation for the transaction. An approximate index selection algorithm: # Start with the catalog version at the beginning of the transaction. # Selecting all indexes by catalog version for a specific table. ## If the index is not in the resulting selection, then simply add it. ## If the index is already present in the resulting selection, then we do nothing. ## If the index is present in the resulting selection but not in the catalog version, then depending on its status it will be: ### registered - the index will be removed from the resulting selection. ### available - will remain in the resulting selection. # Increase the catalog version, if it is greater than the directory version for the update operation, then we will complete the index selection, else go to p. 2. > Selecting indexes when performing update operations for partition > - > > Key: IGNITE-20863 > URL: https://issues.apache.org/jira/browse/IGNITE-20863 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > To implement IGNITE-20125, we need to correctly select the indexes into which > we will insert data during RW transactions update operations for partitions. > It is enough for us to collect all available and registered indexes at the > time of the operation, as well as all dropped available indexes that we can > know about. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20863) Selecting indexes when performing update operations for partition
[ https://issues.apache.org/jira/browse/IGNITE-20863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-20863: - Description: To implement IGNITE-20125, we need to correctly select the indexes into which we will insert data during RW transactions update operations for partitions. What do we need for this: # Catalog version of the transaction by beginTs of the txId. # Catalog version of the update operation for the transaction. An approximate index selection algorithm: # Start with the catalog version at the beginning of the transaction. # Selecting all indexes by catalog version for a specific table. ## If the index is not in the resulting selection, then simply add it. ## If the index is already present in the resulting selection, then we do nothing. ## If the index is present in the resulting selection but not in the catalog version, then depending on its status it will be: ### registered - the index will be removed from the resulting selection. ### available - will remain in the resulting selection. # Increase the catalog version, if it is greater than the directory version for the update operation, then we will complete the index selection, else go to p. 2. was: To implement IGNITE-20125, we need to correctly select the indexes into which we will insert data during update operations for partitions. What do we need for this: # Catalog version of the transaction by beginTs of the txId. # Catalog version of the update operation for the transaction. An approximate index selection algorithm: # Start with the catalog version at the beginning of the transaction. # Selecting all indexes by catalog version for a specific table. ## If the index is not in the resulting selection, then simply add it. ## If the index is already present in the resulting selection, then we do nothing. ## If the index is present in the resulting selection but not in the catalog version, then depending on its status it will be: ### registered - the index will be removed from the resulting selection. ### available - will remain in the resulting selection. # Increase the catalog version, if it is greater than the directory version for the update operation, then we will complete the index selection, else go to p. 2. > Selecting indexes when performing update operations for partition > - > > Key: IGNITE-20863 > URL: https://issues.apache.org/jira/browse/IGNITE-20863 > Project: Ignite > Issue Type: Improvement >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > To implement IGNITE-20125, we need to correctly select the indexes into which > we will insert data during RW transactions update operations for partitions. > What do we need for this: > # Catalog version of the transaction by beginTs of the txId. > # Catalog version of the update operation for the transaction. > An approximate index selection algorithm: > # Start with the catalog version at the beginning of the transaction. > # Selecting all indexes by catalog version for a specific table. > ## If the index is not in the resulting selection, then simply add it. > ## If the index is already present in the resulting selection, then we do > nothing. > ## If the index is present in the resulting selection but not in the catalog > version, then depending on its status it will be: > ### registered - the index will be removed from the resulting selection. > ### available - will remain in the resulting selection. > # Increase the catalog version, if it is greater than the directory version > for the update operation, then we will complete the index selection, else go > to p. 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20922) Java Thin: direct service invocation with ClientClusterGroup and Service Awareness
[ https://issues.apache.org/jira/browse/IGNITE-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Steshin updated IGNITE-20922: -- Description: Once we implemented Service Awareness for Java Thin Client, we might improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `IgniteClient.services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or to node B as user required. We should prevent the redirection call at #5 and call service only on A or B. And this would help user to choose nodes for loading purposes. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. was: Once we implemented Service Awareness for Java Thin Client, we might improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `IgniteClient.services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or to node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. > Java Thin: direct service invocation with ClientClusterGroup and Service > Awareness > -- > > Key: IGNITE-20922 > URL: https://issues.apache.org/jira/browse/IGNITE-20922 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Steshin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Once we implemented Service Awareness for Java Thin Client, we might improve > the service invocation with `IgniteClient.services(ClientClusterGroup grp)` > Consider: > 1) There is a cluster with some nodes A, B, C, D, F > 1) A service is deployed on nodes A,B,C > 2) Service Awareness is enabled. > 3) User limits the service instance nodes set with > `IgniteClient.services('node A', 'node B')` skipping node C. > 4) The thin client requests the service randomly on node C because it has the > service instance too. > 5) Node C redirects the invocation to node A or to node B as user required. > We should prevent the redirection call at #5 and call service only on A or B. > And this would help user to choose nodes for loading purposes. > It looks simple to intersect passed `ClientClusterGroup` and known set of > service instance nodes before the service calling. I.e. the client can > exclude node C from the options where to send the invocation request. > If user chosses `services('node A', 'node B', 'node F')`, where F has no > service instance, we do not invoke F. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20922) Java Thin: direct service invocation with ClientClusterGroup and Service Awareness
[ https://issues.apache.org/jira/browse/IGNITE-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Steshin updated IGNITE-20922: -- Description: Once we implemented Service Awareness for Java Thin Client, we might improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `IgniteClient.services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. was: Once we implemented Service Awareness for Java Thin Client, we might improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. > Java Thin: direct service invocation with ClientClusterGroup and Service > Awareness > -- > > Key: IGNITE-20922 > URL: https://issues.apache.org/jira/browse/IGNITE-20922 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Steshin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Once we implemented Service Awareness for Java Thin Client, we might improve > the service invocation with `IgniteClient.services(ClientClusterGroup grp)` > Consider: > 1) There is a cluster with some nodes A, B, C, D, F > 1) A service is deployed on nodes A,B,C > 2) Service Awareness is enabled. > 3) User limits the service instance nodes set with > `IgniteClient.services('node A', 'node B')` skipping node C. > 4) The thin client requests the service randomly on node C because it has the > service instance too. > 5) Node C redirects the invocation to node A or node B as user required. > We should prevent the redirection call at #5 and call service only on A or B. > It looks simple to intersect passed `ClientClusterGroup` and known set of > service instance nodes before the service calling. I.e. the client can > exclude node C from the options where to send the invocation request. > If user chosses `services('node A', 'node B', 'node F')`, where F has no > service instance, we do not invoke F. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20922) Java Thin: direct service invocation with ClientClusterGroup and Service Awareness
[ https://issues.apache.org/jira/browse/IGNITE-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Steshin updated IGNITE-20922: -- Description: Once we implemented Service Awareness for Java Thin Client, we might improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `IgniteClient.services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or to node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. was: Once we implemented Service Awareness for Java Thin Client, we might improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `IgniteClient.services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. > Java Thin: direct service invocation with ClientClusterGroup and Service > Awareness > -- > > Key: IGNITE-20922 > URL: https://issues.apache.org/jira/browse/IGNITE-20922 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Steshin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Once we implemented Service Awareness for Java Thin Client, we might improve > the service invocation with `IgniteClient.services(ClientClusterGroup grp)` > Consider: > 1) There is a cluster with some nodes A, B, C, D, F > 1) A service is deployed on nodes A,B,C > 2) Service Awareness is enabled. > 3) User limits the service instance nodes set with > `IgniteClient.services('node A', 'node B')` skipping node C. > 4) The thin client requests the service randomly on node C because it has the > service instance too. > 5) Node C redirects the invocation to node A or to node B as user required. > We should prevent the redirection call at #5 and call service only on A or B. > It looks simple to intersect passed `ClientClusterGroup` and known set of > service instance nodes before the service calling. I.e. the client can > exclude node C from the options where to send the invocation request. > If user chosses `services('node A', 'node B', 'node F')`, where F has no > service instance, we do not invoke F. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20922) Java Thin: direct service invocation with ClientClusterGroup and Service Awareness
[ https://issues.apache.org/jira/browse/IGNITE-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Steshin updated IGNITE-20922: -- Description: Once we implemented Service Awareness for Java Thin Client, we might improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. was: Once we implemented Service Awareness for Java Thin Client, we can improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. > Java Thin: direct service invocation with ClientClusterGroup and Service > Awareness > -- > > Key: IGNITE-20922 > URL: https://issues.apache.org/jira/browse/IGNITE-20922 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Steshin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Once we implemented Service Awareness for Java Thin Client, we might improve > the service invocation with `IgniteClient.services(ClientClusterGroup grp)` > Consider: > 1) There is a cluster with some nodes A, B, C, D, F > 1) A service is deployed on nodes A,B,C > 2) Service Awareness is enabled. > 3) User limits the service instance nodes set with `services('node A', 'node > B')` skipping node C. > 4) The thin client requests the service randomly on node C because it has the > service instance too. > 5) Node C redirects the invocation to node A or node B as user required. > We should prevent the redirection call at #5 and call service only on A or B. > It looks simple to intersect passed `ClientClusterGroup` and known set of > service instance nodes before the service calling. I.e. the client can > exclude node C from the options where to send the invocation request. > If user chosses `services('node A', 'node B', 'node F')`, where F has no > service instance, we do not invoke F. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20922) Java Thin: direct service invocation with ClientClusterGroup and Service Awareness
[ https://issues.apache.org/jira/browse/IGNITE-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Steshin updated IGNITE-20922: -- Description: Once we implemented Service Awareness for Java Thin Client, we can improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. was: Once we implemented Service Awareness for Java Thin Client, we can improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. > Java Thin: direct service invocation with ClientClusterGroup and Service > Awareness > -- > > Key: IGNITE-20922 > URL: https://issues.apache.org/jira/browse/IGNITE-20922 > Project: Ignite > Issue Type: Improvement >Reporter: Vladimir Steshin >Priority: Major > > Once we implemented Service Awareness for Java Thin Client, we can improve > the service invocation with `IgniteClient.services(ClientClusterGroup grp)` > Consider: > 1) There is a cluster with some nodes A, B, C, D, F > 1) A service is deployed on nodes A,B,C > 2) Service Awareness is enabled. > 3) User limits the service instance nodes set with `services('node A', 'node > B')` skipping node C. > 4) The thin client requests the service randomly on node C because it has the > service instance too. > 5) Node C redirects the invocation to node A or node B as user required. > We should prevent the redirection call at #5 and call service only on A or B. > It looks simple to intersect passed `ClientClusterGroup` and known set of > service instance nodes before the service calling. I.e. the client can > exclude node C from the options where to send the invocation request. > If user chosses `services('node A', 'node B', 'node F')`, where F has no > service instance, we do not invoke F. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20922) Java Thin: direct service invocation with ClientClusterGroup and Service Awareness
Vladimir Steshin created IGNITE-20922: - Summary: Java Thin: direct service invocation with ClientClusterGroup and Service Awareness Key: IGNITE-20922 URL: https://issues.apache.org/jira/browse/IGNITE-20922 Project: Ignite Issue Type: Improvement Reporter: Vladimir Steshin Once we implemented Service Awareness for Java Thin Client, we can improve the service invocation with `IgniteClient.services(ClientClusterGroup grp)` Consider: 1) There is a cluster with some nodes A, B, C, D, F 1) A service is deployed on nodes A,B,C 2) Service Awareness is enabled. 3) User limits the service instance nodes set with `services('node A', 'node B')` skipping node C. 4) The thin client requests the service randomly on node C because it has the service instance too. 5) Node C redirects the invocation to node A or node B as user required. We should prevent the redirection call at #5 and call service only on A or B. It looks simple to intersect passed `ClientClusterGroup` and known set of service instance nodes before the service calling. I.e. the client can exclude node C from the options where to send the invocation request. If user chosses `services('node A', 'node B', 'node F')`, where F has no service instance, we do not invoke F. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20569) .NET: Thin 3.0: Revise logging
[ https://issues.apache.org/jira/browse/IGNITE-20569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-20569: Description: .NET client in Ignite 3.x uses custom *IIgniteLogger* interface. This approach is copied from 2.x client; it is outdated and inefficient. Revise it: * Should we add a dependency on [Microsoft.Extensions.Logging.Abstractions|https://www.nuget.org/packages/Microsoft.Extensions.Logging.Abstractions/6.0.4#dependencies-body-tab], and use standard *ILogger* instead (accept *ILoggerFactory* in config)? * Without that, how do we support structured logging, templates, code-generated loggers? * How do we integrate with popular loggers? With *ILogger* it comes out of the box. See *Logging guidance for .NET library authors*: https://learn.microsoft.com/en-us/dotnet/core/extensions/logging-library-authors was: .NET client in Ignite 3.x uses custom *IIgniteLogger* interface. This approach is copied from 2.x client; it is outdated and inefficient. Revise it: * Should we add a dependency on [Microsoft.Extensions.Logging.Abstractions|https://www.nuget.org/packages/Microsoft.Extensions.Logging.Abstractions/6.0.4#dependencies-body-tab], and use standard *ILogger* instead (accept *ILoggerFactory* in config)? * Without that, how do we support structured logging, templates, code-generated loggers? * How do we integrate with popular loggers? With *ILogger* it comes out of the box. > .NET: Thin 3.0: Revise logging > -- > > Key: IGNITE-20569 > URL: https://issues.apache.org/jira/browse/IGNITE-20569 > Project: Ignite > Issue Type: Improvement > Components: platforms, thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Blocker > Labels: .NET, ignite-3 > Fix For: 3.0.0-beta2 > > > .NET client in Ignite 3.x uses custom *IIgniteLogger* interface. This > approach is copied from 2.x client; it is outdated and inefficient. > Revise it: > * Should we add a dependency on > [Microsoft.Extensions.Logging.Abstractions|https://www.nuget.org/packages/Microsoft.Extensions.Logging.Abstractions/6.0.4#dependencies-body-tab], > and use standard *ILogger* instead (accept *ILoggerFactory* in config)? > * Without that, how do we support structured logging, templates, > code-generated loggers? > * How do we integrate with popular loggers? With *ILogger* it comes out of > the box. > See *Logging guidance for .NET library authors*: > https://learn.microsoft.com/en-us/dotnet/core/extensions/logging-library-authors -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20874) Node cleanup procedure
[ https://issues.apache.org/jira/browse/IGNITE-20874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Sizov reassigned IGNITE-20874: -- Assignee: Kirill Sizov > Node cleanup procedure > -- > > Key: IGNITE-20874 > URL: https://issues.apache.org/jira/browse/IGNITE-20874 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Kirill Sizov >Priority: Major > Labels: ignite-3 > > h3. Motivation > In the final stage, an RW transaction sends cleanup messages to all enlisted > replication groups in the transaction. Although several of these groups might > be in the same node, the nodes would be notified several times. Besides, a > release lock procedure makes sense only once for a specific transaction on > the node. > h3. Definition of done > Implement a node-wide cleanup. This procedure should be triggered by a direct > message to a particular node and release all locks for a specific transaction > synchronously (before response). The request also triggers replication > cleanup, which fixes all write intents for the specific transaction. > h3. Implementation notes > * Add a new message that ought to be named LockReleaseMessage. > * The new message has a pure network nature (not a replication request). > * The message should contain a list of replication groups (the list might be > empty) and a transaction ID. > * When the message is received, it should release all locks that are held by > the transaction. Then update the lock transation state map and start the > cleanup process over all replication groups in the list and immediately reply > (do not wait for cleanup on replication groups). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20921) Add stopGuard and busyLock to client connector classes
[ https://issues.apache.org/jira/browse/IGNITE-20921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-20921: Description: As noted in [PR review|https://github.com/apache/ignite-3/pull/2825#discussion_r1395248456], we should have *stopGuard* and *busyLock* in client connector classes, such as *ClientHandlerModule*, *ClientPrimaryReplicaTracker*, and potentially some others. > Add stopGuard and busyLock to client connector classes > -- > > Key: IGNITE-20921 > URL: https://issues.apache.org/jira/browse/IGNITE-20921 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > As noted in [PR > review|https://github.com/apache/ignite-3/pull/2825#discussion_r1395248456], > we should have *stopGuard* and *busyLock* in client connector classes, such > as *ClientHandlerModule*, *ClientPrimaryReplicaTracker*, and potentially some > others. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20921) Add stopGuard and busyLock to client connector classes
Pavel Tupitsyn created IGNITE-20921: --- Summary: Add stopGuard and busyLock to client connector classes Key: IGNITE-20921 URL: https://issues.apache.org/jira/browse/IGNITE-20921 Project: Ignite Issue Type: Improvement Components: thin client Affects Versions: 3.0.0-beta1 Reporter: Pavel Tupitsyn Assignee: Pavel Tupitsyn Fix For: 3.0.0-beta2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-20919) "TransactionException: Replication is timed out" after inserting 9M entries
[ https://issues.apache.org/jira/browse/IGNITE-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788672#comment-17788672 ] Ivan Artiukhov edited comment on IGNITE-20919 at 11/22/23 8:29 AM: --- The similar error on reads via key-value API. See the same [^logs.zip]. {noformat} Command line: -db site.ycsb.db.ignite3.IgniteClient -t -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 1 -p operationcount=25 -p recordcount=25 -p warmupops=5 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p hosts=192.168.1.107 -p recordcount=2500 -p operationcount=2500 -s YCSB Client 2023.8Loading workload... Data integrity is enabled. Starting test. 2023-11-21 19:06:29:570 [WARM-UP] 0 sec: 0 operations; est completion in 0 second 2023-11-21 19:06:29:570 [WARM-UP] 0 sec: 0 operations; est completion in 0 second [19:06:29][INFO ][Thread-2] Create table request: CREATE TABLE IF NOT EXISTS usertable (yscb_key VARCHAR PRIMARY KEY, field0 VARCHAR, field1 VARCHAR, field2 VARCHAR, field3 VARCHAR, field4 VARCHAR, field5 VARCHAR, field6 VARCHAR, field7 VARCHAR, field8 VARCHAR, field9 VARCHAR) DBWrapper: report latency for each error is false and specific error codes to track for latency are: [] 2023-11-21 19:06:30:568 [WARM-UP] 1 sec: 1447 operations; 1447 current ops/sec; est completion in 4 hours 48 minutes 2023-11-21 19:06:30:568 [WARM-UP] 1 sec: 1447 operations; 1447 current ops/sec; est completion in 4 hours 48 minutes ... ... 2023-11-21 19:50:37:567 [PAYLOAD] 2648 sec: 11867826 operations; 4573 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=246.56] [VERIFY AverageLatency(us)=2.87] [READ-FAILED AverageLatency(us)=191.82] 2023-11-21 19:50:37:567 [PAYLOAD] 2648 sec: 11867826 operations; 4573 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=246.56] [VERIFY AverageLatency(us)=2.87] [READ-FAILED AverageLatency(us)=191.82] 2023-11-21 19:50:38:567 [PAYLOAD] 2649 sec: 11868355 operations; 529 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=518.38] [VERIFY AverageLatency(us)=3.23] [READ-FAILED AverageLatency(us)=695.92] 2023-11-21 19:50:38:567 [PAYLOAD] 2649 sec: 11868355 operations; 529 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=518.38] [VERIFY AverageLatency(us)=3.23] [READ-FAILED AverageLatency(us)=695.92] 2023-11-21 19:50:39:567 [PAYLOAD] 2650 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes 2023-11-21 19:50:39:567 [PAYLOAD] 2650 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes 2023-11-21 19:50:40:567 [PAYLOAD] 2651 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes 2023-11-21 19:50:40:567 [PAYLOAD] 2651 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes ... ... 2023-11-21 19:51:37:567 [PAYLOAD] 2708 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:38:567 [PAYLOAD] 2709 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:38:567 [PAYLOAD] 2709 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:39:567 [PAYLOAD] 2710 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:39:567 [PAYLOAD] 2710 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes [19:51:40][ERROR][Thread-2] Error reading key: user3512309439425897761 org.apache.ignite.lang.IgniteException: The primary replica await timed out [replicationGroupId=5_part_19, referenceTimestamp=HybridTimestamp [physical=2023-11-21 19:51:10:466 +0300, logical=7, composite=111449569392459783], currentLease=Lease [leaseholder=poc-tester-SERVER-192.168.1.107-id-0, leaseholderId=0ea1764c-7664-4b61-83da-b7ed592082ae, accepted=false, startTime=HybridTimestamp [physical=2023-11-21 19:51:09:453 +0300, logical=46, composite=111449569326071854], expirationTime=HybridTimestamp [physical=2023-11-21 19:53:09:453 +0300, logical=0, composite=111449577190391808], prolongable=false, replicationGroupId=5_part_19]] at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:754) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:688) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.ClientUtils.copyExceptionWithCauseIfPossible(ClientUtils.java:73) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at
[jira] [Commented] (IGNITE-20919) "TransactionException: Replication is timed out" after inserting 9M entries
[ https://issues.apache.org/jira/browse/IGNITE-20919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788672#comment-17788672 ] Ivan Artiukhov commented on IGNITE-20919: - The similar error on reads via key-value API. See the same [^logs.zip]. {noformat} Command line: -db site.ycsb.db.ignite3.IgniteClient -t -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 1 -p operationcount=25 -p recordcount=25 -p warmupops=5 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p hosts=192.168.1.107 -p recordcount=2500 -p operationcount=2500 -s YCSB Client 2023.8Loading workload... Data integrity is enabled. Starting test. 2023-11-21 19:06:29:570 [WARM-UP] 0 sec: 0 operations; est completion in 0 second 2023-11-21 19:06:29:570 [WARM-UP] 0 sec: 0 operations; est completion in 0 second [19:06:29][INFO ][Thread-2] Create table request: CREATE TABLE IF NOT EXISTS usertable (yscb_key VARCHAR PRIMARY KEY, field0 VARCHAR, field1 VARCHAR, field2 VARCHAR, field3 VARCHAR, field4 VARCHAR, field5 VARCHAR, field6 VARCHAR, field7 VARCHAR, field8 VARCHAR, field9 VARCHAR) DBWrapper: report latency for each error is false and specific error codes to track for latency are: [] 2023-11-21 19:06:30:568 [WARM-UP] 1 sec: 1447 operations; 1447 current ops/sec; est completion in 4 hours 48 minutes 2023-11-21 19:06:30:568 [WARM-UP] 1 sec: 1447 operations; 1447 current ops/sec; est completion in 4 hours 48 minutes ... ... 2023-11-21 19:50:37:567 [PAYLOAD] 2648 sec: 11867826 operations; 4573 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=246.56] [VERIFY AverageLatency(us)=2.87] [READ-FAILED AverageLatency(us)=191.82] 2023-11-21 19:50:37:567 [PAYLOAD] 2648 sec: 11867826 operations; 4573 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=246.56] [VERIFY AverageLatency(us)=2.87] [READ-FAILED AverageLatency(us)=191.82] 2023-11-21 19:50:38:567 [PAYLOAD] 2649 sec: 11868355 operations; 529 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=518.38] [VERIFY AverageLatency(us)=3.23] [READ-FAILED AverageLatency(us)=695.92] 2023-11-21 19:50:38:567 [PAYLOAD] 2649 sec: 11868355 operations; 529 current ops/sec; est completion in 49 minutes [READ AverageLatency(us)=518.38] [VERIFY AverageLatency(us)=3.23] [READ-FAILED AverageLatency(us)=695.92] 2023-11-21 19:50:39:567 [PAYLOAD] 2650 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes 2023-11-21 19:50:39:567 [PAYLOAD] 2650 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes 2023-11-21 19:50:40:567 [PAYLOAD] 2651 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes 2023-11-21 19:50:40:567 [PAYLOAD] 2651 sec: 11868355 operations; 0 current ops/sec; est completion in 49 minutes ... ... 2023-11-21 19:51:37:567 [PAYLOAD] 2708 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:38:567 [PAYLOAD] 2709 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:38:567 [PAYLOAD] 2709 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:39:567 [PAYLOAD] 2710 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes 2023-11-21 19:51:39:567 [PAYLOAD] 2710 sec: 11868394 operations; 0 current ops/sec; est completion in 50 minutes [19:51:40][ERROR][Thread-2] Error reading key: user3512309439425897761 org.apache.ignite.lang.IgniteException: The primary replica await timed out [replicationGroupId=5_part_19, referenceTimestamp=HybridTimestamp [physical=2023-11-21 19:51:10:466 +0300, logical=7, composite=111449569392459783], currentLease=Lease [leaseholder=poc-tester-SERVER-192.168.1.107-id-0, leaseholderId=0ea1764c-7664-4b61-83da-b7ed592082ae, accepted=false, startTime=HybridTimestamp [physical=2023-11-21 19:51:09:453 +0300, logical=46, composite=111449569326071854], expirationTime=HybridTimestamp [physical=2023-11-21 19:53:09:453 +0300, logical=0, composite=111449577190391808], prolongable=false, replicationGroupId=5_part_19]] at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:754) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:688) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.ClientUtils.copyExceptionWithCauseIfPossible(ClientUtils.java:73) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.ClientUtils.ensurePublicException(ClientUtils.java:54)
[jira] [Updated] (IGNITE-20920) .NET: TestPutRoutesRequestToPrimaryNode is flaky
[ https://issues.apache.org/jira/browse/IGNITE-20920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-20920: Description: Test history: https://ci.ignite.apache.org/test/6220606330906984960?currentProjectId=ApacheIgnite3xGradle_Test=true Error: {code} Apache.Ignite.IgniteClientConnectionException : Exception while reading from socket, connection closed: Connection lost (failed to read data from socket) > Apache.Ignite.IgniteClientConnectionException : Connection lost (failed to read data from socket) > System.Net.Sockets.SocketException : Software caused connection abort at Apache.Ignite.Internal.ClientFailoverSocket.DoOutInOpAndGetSocketAsync(ClientOp clientOp, Transaction tx, PooledArrayBuffer request, PreferredNode preferredNode, IRetryPolicy retryPolicyOverride) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/ClientFailoverSocket.cs:line 195 at Apache.Ignite.Internal.Table.RecordView`1.DoOutInOpAsync(ClientOp clientOp, Transaction tx, PooledArrayBuffer request, PreferredNode preferredNode, IRetryPolicy retryPolicyOverride) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/Table/RecordView.cs:line 400 at Apache.Ignite.Internal.Table.RecordView`1.DoRecordOutOpAsync(ClientOp op, ITransaction transaction, T record, Boolean keyOnly, Nullable`1 schemaVersionOverride) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/Table/RecordView.cs:line 425 at Apache.Ignite.Internal.Table.RecordView`1.UpsertAsync(ITransaction transaction, T record) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/Table/RecordView.cs:line 172 at Apache.Ignite.Tests.PartitionAwarenessRealClusterTests.TestPutRoutesRequestToPrimaryNode() in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite.Tests/PartitionAwarenessRealClusterTests.cs:line 65 at NUnit.Framework.Internal.TaskAwaitAdapter.GenericAdapter`1.BlockUntilCompleted() at NUnit.Framework.Internal.MessagePumpStrategy.NoMessagePumpStrategy.WaitForCompletion(AwaitAdapter awaiter) at NUnit.Framework.Internal.AsyncToSyncAdapter.Await(Func`1 invoke) at NUnit.Framework.Internal.Commands.TestMethodCommand.RunTestMethod(TestExecutionContext context) at NUnit.Framework.Internal.Commands.TestMethodCommand.Execute(TestExecutionContext context) at NUnit.Framework.Internal.Commands.BeforeAndAfterTestCommand.<>c__DisplayClass1_0.b__0() at NUnit.Framework.Internal.Commands.DelegatingTestCommand.RunTestMethodInThreadAbortSafeZone(TestExecutionContext context, Action action) --IgniteClientConnectionException at Apache.Ignite.Internal.ClientSocket.ReceiveBytesAsync(Stream stream, Byte[] buffer, Int32 size, CancellationToken cancellationToken) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/ClientSocket.cs:line 498 at Apache.Ignite.Internal.ClientSocket.ReadMessageSizeAsync(Stream stream, Byte[] buffer, CancellationToken cancellationToken) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/ClientSocket.cs:line 478 at Apache.Ignite.Internal.ClientSocket.ReadResponseAsync(Stream stream, Byte[] messageSizeBytes, CancellationToken cancellationToken) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/ClientSocket.cs:line 452 at Apache.Ignite.Internal.ClientSocket.RunReceiveLoop(CancellationToken cancellationToken) in /opt/buildagent/work/b8d4df1365f1f1e5/modules/platforms/dotnet/Apache.Ignite/Internal/ClientSocket.cs:line 700 {code} > .NET: TestPutRoutesRequestToPrimaryNode is flaky > > > Key: IGNITE-20920 > URL: https://issues.apache.org/jira/browse/IGNITE-20920 > Project: Ignite > Issue Type: Bug > Components: platforms, thin client >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Test history: > https://ci.ignite.apache.org/test/6220606330906984960?currentProjectId=ApacheIgnite3xGradle_Test=true > Error: > {code} > Apache.Ignite.IgniteClientConnectionException : Exception while reading from > socket, connection closed: Connection lost (failed to read data from socket) > > Apache.Ignite.IgniteClientConnectionException : Connection lost > (failed to read data from socket) > > System.Net.Sockets.SocketException : Software caused connection abort >at > Apache.Ignite.Internal.ClientFailoverSocket.DoOutInOpAndGetSocketAsync(ClientOp > clientOp, Transaction tx, PooledArrayBuffer request, PreferredNode > preferredNode, IRetryPolicy retryPolicyOverride) in >
[jira] [Created] (IGNITE-20920) .NET: TestPutRoutesRequestToPrimaryNode is flaky
Pavel Tupitsyn created IGNITE-20920: --- Summary: .NET: TestPutRoutesRequestToPrimaryNode is flaky Key: IGNITE-20920 URL: https://issues.apache.org/jira/browse/IGNITE-20920 Project: Ignite Issue Type: Bug Components: platforms, thin client Reporter: Pavel Tupitsyn Assignee: Pavel Tupitsyn Fix For: 3.0.0-beta2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20919) "TransactionException: Replication is timed out" after inserting 9M entries
Ivan Artiukhov created IGNITE-20919: --- Summary: "TransactionException: Replication is timed out" after inserting 9M entries Key: IGNITE-20919 URL: https://issues.apache.org/jira/browse/IGNITE-20919 Project: Ignite Issue Type: Bug Components: persistence Reporter: Ivan Artiukhov Attachments: logs.zip AI3, rev. e1c9b1c4cf589c71aecc4815a3c8a14ae8fbf2f3 (Nov 21 2023) Benchmark: [https://github.com/gridgain/YCSB/blob/ycsb-2023.8/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] The benchmark uses key-value API to put/get entries. h1. Setup 1 server node aipersist, 25 partitions, raft.fsync=false h1. Steps Run a single instance of the benchmark in preload mode with 1 thread. Number of unique entries – 25 million. {noformat} Command line: -db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 1 -p recordcount=25 -p warmupops=5 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p hosts=192.168.1.107 -p recordcount=2500 -p operationcount=2500 -s {noformat} h1. Expected result All 25 million entries were loaded without errors. h1. Actual result {{TransactionException: Replication is timed out after inserting 9.1 million entries:}} {noformat} Starting test. 2023-11-21 18:11:29:396 [WARM-UP] 0 sec: 0 operations; est completion in 0 second ... 2023-11-21 19:06:21:394 [PAYLOAD] 3292 sec: 9151900 operations; 1880 current ops/sec; est completion in 1 hour 35 minutes [INSERT AverageLatency(us)=522.76] 2023-11-21 19:06:21:394 [PAYLOAD] 3292 sec: 9151900 operations; 1880 current ops/sec; est completion in 1 hour 35 minutes [INSERT AverageLatency(us)=522.76] 2023-11-21 19:06:22:394 [PAYLOAD] 3293 sec: 9153100 operations; 1200 current ops/sec; est completion in 1 hour 35 minutes [INSERT AverageLatency(us)=531.8] 2023-11-21 19:06:22:394 [PAYLOAD] 3293 sec: 9153100 operations; 1200 current ops/sec; est completion in 1 hour 35 minutes [INSERT AverageLatency(us)=531.8] 2023-11-21 19:06:23:394 [PAYLOAD] 3294 sec: 9153100 operations; 0 current ops/sec; est completion in 1 hour 35 minutes 2023-11-21 19:06:23:394 [PAYLOAD] 3294 sec: 9153100 operations; 0 current ops/sec; est completion in 1 hour 35 minutes 2023-11-21 19:06:24:394 [PAYLOAD] 3295 sec: 9153100 operations; 0 current ops/sec; est completion in 1 hour 35 minutes 2023-11-21 19:06:24:394 [PAYLOAD] 3295 sec: 9153100 operations; 0 current ops/sec; est completion in 1 hour 35 minutes [19:06:25][ERROR][Thread-2] Error inserting key: user3519618738173805240 org.apache.ignite.tx.TransactionException: Replication is timed out [replicaGrpId=5_part_19] at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:754) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:688) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.ClientUtils.copyExceptionWithCauseIfPossible(ClientUtils.java:73) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.ClientUtils.ensurePublicException(ClientUtils.java:54) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.ClientUtils.sync(ClientUtils.java:97) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:168) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:47) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at site.ycsb.db.ignite3.IgniteClient.insert(IgniteClient.java:127) [ignite3-binding-2023.8.jar:?] at site.ycsb.DBWrapper.insert(DBWrapper.java:237) [core-2023.8.jar:?] at site.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:623) [core-2023.8.jar:?] at site.ycsb.ClientThread.run(ClientThread.java:167) [core-2023.8.jar:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: org.apache.ignite.tx.TransactionException: Replication is timed out [replicaGrpId=5_part_19] at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:754) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:688) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at
[jira] [Assigned] (IGNITE-20723) Tests fail on TC because a primary replica is not assigned or does not respond
[ https://issues.apache.org/jira/browse/IGNITE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Pyatkov reassigned IGNITE-20723: -- Assignee: (was: Vladislav Pyatkov) > Tests fail on TC because a primary replica is not assigned or does not respond > -- > > Key: IGNITE-20723 > URL: https://issues.apache.org/jira/browse/IGNITE-20723 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Runner_SQL_Logic_11804.log.zip > > > TC run is available > [here|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRunnerSqlLogic/7584713?hideProblemsFromDependencies=false=false=true=true=true]. > By my brig analysis, the issue is somewhere in the assignments: > {noformat} > [2023-10-06T13:03:51,231][INFO > ][%sqllogic0%Raft-Group-Client-1][PlacementDriverManager] Placement driver > active actor is starting. > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Received > LeaseGrantedMessage for replica belonging to group=291_part_3, force=false > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Waiting for actual > storage state, group=291_part_3 > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%JRaft-Request-Processor-5][ReplicaManager] Lease accepted, > group=291_part_3, leaseStartTime=HybridTimestamp [time=88228107862067, > physical=1696597718931, logical=51], leaseExpirationTime=HybridTimestamp > [time=88235972182016, physical=1696597838931, logical=0] > [2023-10-06T13:08:50,256][WARN > ][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error > during the request type=ActionRequestImpl occurred (will be retried on the > randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > [?:?] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > [?:?] > at > java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) > [?:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: java.util.concurrent.TimeoutException > ... 7 more > at > app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42) > at app//org.junit.jupiter.api.Assertions.fail(Assertions.java:147) > at > app//org.apache.ignite.internal.sqllogic.Statement.execute(Statement.java:112) > at > app//org.apache.ignite.internal.sqllogic.SqlScriptRunner.run(SqlScriptRunner.java:70) > at > app//org.junit.jupiter.api.AssertTimeoutPreemptively.lambda$assertTimeoutPreemptively$0(AssertTimeoutPreemptively.java:48) > at > app//org.junit.jupiter.api.AssertTimeoutPreemptively.lambda$submitTask$3(AssertTimeoutPreemptively.java:95) > at > java.base@11.0.17/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.ignite.tx.TransactionException: IGN-REP-3 > TraceId:0002850d-2b24-4356-a0dc-2c25af4202a1 > org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: > IGN-REP-3 TraceId:0002850d-2b24-4356-a0dc-2c25af4202a1 Replication is timed > out [replicaGrpId=291_part_3] > at > java.base@11.0.17/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) > at > app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) > at >
[jira] [Assigned] (IGNITE-20723) Tests fail on TC because a primary replica is not assigned or does not respond
[ https://issues.apache.org/jira/browse/IGNITE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Pyatkov reassigned IGNITE-20723: -- Assignee: Vladislav Pyatkov > Tests fail on TC because a primary replica is not assigned or does not respond > -- > > Key: IGNITE-20723 > URL: https://issues.apache.org/jira/browse/IGNITE-20723 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Runner_SQL_Logic_11804.log.zip > > > TC run is available > [here|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRunnerSqlLogic/7584713?hideProblemsFromDependencies=false=false=true=true=true]. > By my brig analysis, the issue is somewhere in the assignments: > {noformat} > [2023-10-06T13:03:51,231][INFO > ][%sqllogic0%Raft-Group-Client-1][PlacementDriverManager] Placement driver > active actor is starting. > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Received > LeaseGrantedMessage for replica belonging to group=291_part_3, force=false > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Waiting for actual > storage state, group=291_part_3 > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%JRaft-Request-Processor-5][ReplicaManager] Lease accepted, > group=291_part_3, leaseStartTime=HybridTimestamp [time=88228107862067, > physical=1696597718931, logical=51], leaseExpirationTime=HybridTimestamp > [time=88235972182016, physical=1696597838931, logical=0] > [2023-10-06T13:08:50,256][WARN > ][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error > during the request type=ActionRequestImpl occurred (will be retried on the > randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > [?:?] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > [?:?] > at > java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) > [?:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: java.util.concurrent.TimeoutException > ... 7 more > at > app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42) > at app//org.junit.jupiter.api.Assertions.fail(Assertions.java:147) > at > app//org.apache.ignite.internal.sqllogic.Statement.execute(Statement.java:112) > at > app//org.apache.ignite.internal.sqllogic.SqlScriptRunner.run(SqlScriptRunner.java:70) > at > app//org.junit.jupiter.api.AssertTimeoutPreemptively.lambda$assertTimeoutPreemptively$0(AssertTimeoutPreemptively.java:48) > at > app//org.junit.jupiter.api.AssertTimeoutPreemptively.lambda$submitTask$3(AssertTimeoutPreemptively.java:95) > at > java.base@11.0.17/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.ignite.tx.TransactionException: IGN-REP-3 > TraceId:0002850d-2b24-4356-a0dc-2c25af4202a1 > org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: > IGN-REP-3 TraceId:0002850d-2b24-4356-a0dc-2c25af4202a1 Replication is timed > out [replicaGrpId=291_part_3] > at > java.base@11.0.17/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) > at >
[jira] [Assigned] (IGNITE-20723) Tests fail on TC because a primary replica is not assigned or does not respond
[ https://issues.apache.org/jira/browse/IGNITE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Pyatkov reassigned IGNITE-20723: -- Assignee: (was: Vladislav Pyatkov) > Tests fail on TC because a primary replica is not assigned or does not respond > -- > > Key: IGNITE-20723 > URL: https://issues.apache.org/jira/browse/IGNITE-20723 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Runner_SQL_Logic_11804.log.zip > > > TC run is available > [here|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRunnerSqlLogic/7584713?hideProblemsFromDependencies=false=false=true=true=true]. > By my brig analysis, the issue is somewhere in the assignments: > {noformat} > [2023-10-06T13:03:51,231][INFO > ][%sqllogic0%Raft-Group-Client-1][PlacementDriverManager] Placement driver > active actor is starting. > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Received > LeaseGrantedMessage for replica belonging to group=291_part_3, force=false > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Waiting for actual > storage state, group=291_part_3 > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%JRaft-Request-Processor-5][ReplicaManager] Lease accepted, > group=291_part_3, leaseStartTime=HybridTimestamp [time=88228107862067, > physical=1696597718931, logical=51], leaseExpirationTime=HybridTimestamp > [time=88235972182016, physical=1696597838931, logical=0] > [2023-10-06T13:08:50,256][WARN > ][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error > during the request type=ActionRequestImpl occurred (will be retried on the > randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > [?:?] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > [?:?] > at > java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) > [?:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: java.util.concurrent.TimeoutException > ... 7 more > at > app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42) > at app//org.junit.jupiter.api.Assertions.fail(Assertions.java:147) > at > app//org.apache.ignite.internal.sqllogic.Statement.execute(Statement.java:112) > at > app//org.apache.ignite.internal.sqllogic.SqlScriptRunner.run(SqlScriptRunner.java:70) > at > app//org.junit.jupiter.api.AssertTimeoutPreemptively.lambda$assertTimeoutPreemptively$0(AssertTimeoutPreemptively.java:48) > at > app//org.junit.jupiter.api.AssertTimeoutPreemptively.lambda$submitTask$3(AssertTimeoutPreemptively.java:95) > at > java.base@11.0.17/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base@11.0.17/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.ignite.tx.TransactionException: IGN-REP-3 > TraceId:0002850d-2b24-4356-a0dc-2c25af4202a1 > org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: > IGN-REP-3 TraceId:0002850d-2b24-4356-a0dc-2c25af4202a1 Replication is timed > out [replicaGrpId=291_part_3] > at > java.base@11.0.17/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) > at > app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:772) > at >
[jira] [Assigned] (IGNITE-20918) Leases expire after a node has been restarted
[ https://issues.apache.org/jira/browse/IGNITE-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev reassigned IGNITE-20918: Assignee: Alexander Lapin > Leases expire after a node has been restarted > - > > Key: IGNITE-20918 > URL: https://issues.apache.org/jira/browse/IGNITE-20918 > Project: Ignite > Issue Type: Bug >Reporter: Aleksandr Polovtcev >Assignee: Alexander Lapin >Priority: Critical > Labels: ignite-3 > > IGNITE-20910 introduces a test that inserts some data after restarting a > node. For some reason, after some time, I can see the following messages in > the log: > {noformat} > [2023-11-22T10:00:17,056][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_19] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_0] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_9] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_10] > {noformat} > After that, the test fails with a {{PrimaryReplicaMissException}}. The > problem here, that it is expected that a single node should never have > expired leases, they should be prolongated automatically. I think that this > happens because the initial lease that was issued before the node was > restarted is still accepted by the node after restart. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-20723) Tests fail on TC because a primary replica is not assigned or does not respond
[ https://issues.apache.org/jira/browse/IGNITE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788665#comment-17788665 ] Vladislav Pyatkov edited comment on IGNITE-20723 at 11/22/23 8:12 AM: -- The test failure is not related to the placement driver because cluster nodes cannot be joined due to a problem in the discovery procedure. The reason for the issue is in the scale cube. {noformat} [2023-10-06T13:09:05,844][WARN ][sc-cluster-3345-2][MetadataStore] [default:sqllogic1:6745409cb23745f0@10.233.107.199:3345][a4e654b1-b252-4076-bef6-c244f6043163] Timeout getting GetMetadataResp from 10.233.107.199:3344 within 1000 ms, cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' ( and no fallback has been configured) [2023-10-06T13:09:05,844][WARN ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:6745409cb23745f0@10.233.107.199:3345][updateMembership][MEMBERSHIP_GOSSIP] Skipping to add/update member: {m: default:sqllogic0:1c57e22f10fd47df@10.233.107.199:3344, s: ALIVE, inc: 6}, due to failed fetchMetadata call (cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (and no fallback has been configured)) {noformat} [~rpuch] is aware of the issue and participated in the investigation. was (Author: v.pyatkov): The test failure is not related to the placement driver because cluster nodes cannot be joined due to a problem in the discovery procedure. The reason for the issue is in the scale cube. {noformat} [2023-11-17T00:20:22,152][WARN ][sc-cluster-3345-2][MetadataStore] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][0ccd29d5-2fc2-449c-942c-ae42760adf7d] Timeout getting GetMetadataResp from 10.233.107.205:3344 within 1000 ms, cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (a nd no fallback has been configured) [2023-11-17T00:20:22,153][WARN ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][updateMembership][MEMBERSHIP_GOSSIP] Skipping to add/update member: {m: default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, s: ALIVE, inc: 9}, due to failed fetchMetadata call (cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (and no fallback has been configured)) {noformat} [~rpuch] is aware of the issue and participated in the investigation. > Tests fail on TC because a primary replica is not assigned or does not respond > -- > > Key: IGNITE-20723 > URL: https://issues.apache.org/jira/browse/IGNITE-20723 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Runner_SQL_Logic_11804.log.zip > > > TC run is available > [here|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRunnerSqlLogic/7584713?hideProblemsFromDependencies=false=false=true=true=true]. > By my brig analysis, the issue is somewhere in the assignments: > {noformat} > [2023-10-06T13:03:51,231][INFO > ][%sqllogic0%Raft-Group-Client-1][PlacementDriverManager] Placement driver > active actor is starting. > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Received > LeaseGrantedMessage for replica belonging to group=291_part_3, force=false > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Waiting for actual > storage state, group=291_part_3 > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%JRaft-Request-Processor-5][ReplicaManager] Lease accepted, > group=291_part_3, leaseStartTime=HybridTimestamp [time=88228107862067, > physical=1696597718931, logical=51], leaseExpirationTime=HybridTimestamp > [time=88235972182016, physical=1696597838931, logical=0] > [2023-10-06T13:08:50,256][WARN > ][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error > during the request type=ActionRequestImpl occurred (will be retried on the > randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > ~[?:?] > at >
[jira] [Commented] (IGNITE-20723) Tests fail on TC because a primary replica is not assigned or does not respond
[ https://issues.apache.org/jira/browse/IGNITE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788665#comment-17788665 ] Vladislav Pyatkov commented on IGNITE-20723: The test failure is not related to the placement driver because cluster nodes cannot be joined due to a problem in the discovery procedure. The reason for the issue is in the scale cube. {noformat} [2023-11-17T00:20:22,152][WARN ][sc-cluster-3345-2][MetadataStore] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][0ccd29d5-2fc2-449c-942c-ae42760adf7d] Timeout getting GetMetadataResp from 10.233.107.205:3344 within 1000 ms, cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (a nd no fallback has been configured) [2023-11-17T00:20:22,153][WARN ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][updateMembership][MEMBERSHIP_GOSSIP] Skipping to add/update member: {m: default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, s: ALIVE, inc: 9}, due to failed fetchMetadata call (cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (and no fallback has been configured)) {noformat} [~rpuch] is aware of the issue and participated in the investigation. > Tests fail on TC because a primary replica is not assigned or does not respond > -- > > Key: IGNITE-20723 > URL: https://issues.apache.org/jira/browse/IGNITE-20723 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Runner_SQL_Logic_11804.log.zip > > > TC run is available > [here|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRunnerSqlLogic/7584713?hideProblemsFromDependencies=false=false=true=true=true]. > By my brig analysis, the issue is somewhere in the assignments: > {noformat} > [2023-10-06T13:03:51,231][INFO > ][%sqllogic0%Raft-Group-Client-1][PlacementDriverManager] Placement driver > active actor is starting. > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Received > LeaseGrantedMessage for replica belonging to group=291_part_3, force=false > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Waiting for actual > storage state, group=291_part_3 > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%JRaft-Request-Processor-5][ReplicaManager] Lease accepted, > group=291_part_3, leaseStartTime=HybridTimestamp [time=88228107862067, > physical=1696597718931, logical=51], leaseExpirationTime=HybridTimestamp > [time=88235972182016, physical=1696597838931, logical=0] > [2023-10-06T13:08:50,256][WARN > ][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error > during the request type=ActionRequestImpl occurred (will be retried on the > randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > ~[?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > [?:?] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > [?:?] > at > java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) > [?:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: java.util.concurrent.TimeoutException > ... 7 more > at > app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42) > at app//org.junit.jupiter.api.Assertions.fail(Assertions.java:147) > at > app//org.apache.ignite.internal.sqllogic.Statement.execute(Statement.java:112) > at >
[jira] [Updated] (IGNITE-20918) Leases expire after a node has been restarted
[ https://issues.apache.org/jira/browse/IGNITE-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20918: - Description: IGNITE-20910 introduces a test that inserts some data after restarting a node. For some reason, after some time, I can see the following messages in the log: {noformat} [2023-11-22T10:00:17,056][INFO ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] Primary replica expired [grp=5_part_19] [2023-11-22T10:00:17,057][INFO ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] Primary replica expired [grp=5_part_0] [2023-11-22T10:00:17,057][INFO ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] Primary replica expired [grp=5_part_9] [2023-11-22T10:00:17,057][INFO ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] Primary replica expired [grp=5_part_10] {noformat} After that, the test fails with a {{PrimaryReplicaMissException}}. The problem here, that it is expected that a single node should never have expired leases, they should be prolongated automatically. I think that this happens because the initial lease that was issued before the node was restarted is still accepted by the node after restart. > Leases expire after a node has been restarted > - > > Key: IGNITE-20918 > URL: https://issues.apache.org/jira/browse/IGNITE-20918 > Project: Ignite > Issue Type: Bug >Reporter: Aleksandr Polovtcev >Priority: Critical > Labels: ignite-3 > > IGNITE-20910 introduces a test that inserts some data after restarting a > node. For some reason, after some time, I can see the following messages in > the log: > {noformat} > [2023-11-22T10:00:17,056][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_19] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_0] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_9] > [2023-11-22T10:00:17,057][INFO > ][%isnt_tmpar_0%metastorage-watch-executor-3][PartitionReplicaListener] > Primary replica expired [grp=5_part_10] > {noformat} > After that, the test fails with a {{PrimaryReplicaMissException}}. The > problem here, that it is expected that a single node should never have > expired leases, they should be prolongated automatically. I think that this > happens because the initial lease that was issued before the node was > restarted is still accepted by the node after restart. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-20723) Tests fail on TC because a primary replica is not assigned or does not respond
[ https://issues.apache.org/jira/browse/IGNITE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788665#comment-17788665 ] Vladislav Pyatkov edited comment on IGNITE-20723 at 11/22/23 8:10 AM: -- The test failure is not related to the placement driver because cluster nodes cannot be joined due to a problem in the discovery procedure. The reason for the issue is in the scale cube. {noformat} [2023-11-17T00:20:22,152][WARN ][sc-cluster-3345-2][MetadataStore] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][0ccd29d5-2fc2-449c-942c-ae42760adf7d] Timeout getting GetMetadataResp from 10.233.107.205:3344 within 1000 ms, cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (a nd no fallback has been configured) [2023-11-17T00:20:22,153][WARN ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][updateMembership][MEMBERSHIP_GOSSIP] Skipping to add/update member: {m: default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, s: ALIVE, inc: 9}, due to failed fetchMetadata call (cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (and no fallback has been configured)) {noformat} [~rpuch] is aware of the issue and participated in the investigation. was (Author: v.pyatkov): The test failure is not related to the placement driver because cluster nodes cannot be joined due to a problem in the discovery procedure. The reason for the issue is in the scale cube. {noformat} [2023-11-17T00:20:22,152][WARN ][sc-cluster-3345-2][MetadataStore] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][0ccd29d5-2fc2-449c-942c-ae42760adf7d] Timeout getting GetMetadataResp from 10.233.107.205:3344 within 1000 ms, cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (a nd no fallback has been configured) [2023-11-17T00:20:22,153][WARN ][sc-cluster-3345-2][MembershipProtocol] [default:sqllogic1:1ca7b2f5308489d@10.233.107.205:3345][updateMembership][MEMBERSHIP_GOSSIP] Skipping to add/update member: {m: default:sqllogic0:6a78c57fcd0a496d@10.233.107.205:3344, s: ALIVE, inc: 9}, due to failed fetchMetadata call (cause: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'source(MonoDefer)' (and no fallback has been configured)) {noformat} [~rpuch] is aware of the issue and participated in the investigation. > Tests fail on TC because a primary replica is not assigned or does not respond > -- > > Key: IGNITE-20723 > URL: https://issues.apache.org/jira/browse/IGNITE-20723 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Runner_SQL_Logic_11804.log.zip > > > TC run is available > [here|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRunnerSqlLogic/7584713?hideProblemsFromDependencies=false=false=true=true=true]. > By my brig analysis, the issue is somewhere in the assignments: > {noformat} > [2023-10-06T13:03:51,231][INFO > ][%sqllogic0%Raft-Group-Client-1][PlacementDriverManager] Placement driver > active actor is starting. > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Received > LeaseGrantedMessage for replica belonging to group=291_part_3, force=false > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%MessagingService-inbound--0][ReplicaManager] Waiting for actual > storage state, group=291_part_3 > [2023-10-06T13:08:38,981][INFO > ][%sqllogic1%JRaft-Request-Processor-5][ReplicaManager] Lease accepted, > group=291_part_3, leaseStartTime=HybridTimestamp [time=88228107862067, > physical=1696597718931, logical=51], leaseExpirationTime=HybridTimestamp > [time=88235972182016, physical=1696597838931, logical=0] > [2023-10-06T13:08:50,256][WARN > ][CompletableFutureDelayScheduler][RaftGroupServiceImpl] Recoverable error > during the request type=ActionRequestImpl occurred (will be retried on the > randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > ~[?:?] > at > java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > ~[?:?] > at > java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > ~[?:?] > at >
[jira] [Updated] (IGNITE-20603) Restore logical topology change event on a node restart
[ https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20603: - Description: h3. *Motivation* It is possible that some events were propagated to {{ms.logicalTopology}}, but restart happened when we were updating topologyAugmentationMap and other states in {{DistributionZoneManager#createMetastorageTopologyListener}}. That means that augmentation that must be added to {{zone.topologyAugmentationMap}} wasn't added and we need to recover this information, or nodesAttributes wasn't propogated to MS. h3. *Definition of done* On a node restart, all states, that were going to be updated during watch event in {{DistributionZoneManager#createMetastorageTopologyListener}} must be recovered h3. *Implementation notes* (outdated, see UPD) For every zone, compare {{MS.local.logicalTopology.revision}} with max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare {{MS.local.logicalTopology}} with {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. {{lastSeenTopology}} is calculated in the following way: we read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from {{MS.local.dataNodes}} and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with {{MS.local.logicalTopology}} will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and add them to the topologyAugmentationMap. As a revision (key for topologyAugmentationMap) take {{MS.local.logicalTopology.revision}}. It is safe to take this revision, because if some node was added to the {{ms.topology}} after immediate data nodes recalculation, this added node must restore this immediate data nodes' recalculation intent. UPD: Implementation notes are outdated, we've implemented a bit different approach: now we save the last handled topology to MS, and on restart we restore global states according to states from local metastorage and check if the current ms.logicalTopology differs from the one that was handled in DistributionZoneManager#createMetastorageTopologyListener (we check revision of this events), then we just repeat the logic from DistributionZoneManager#createMetastorageTopologyListener with the new logical topology from the ms.logicalTopology. was: h3. *Motivation* It is possible that some events were propagated to {{ms.logicalTopology}}, but restart happened when we were updating topologyAugmentationMap and other states in {{DistributionZoneManager#createMetastorageTopologyListener}}. That means that augmentation that must be added to {{zone.topologyAugmentationMap}} wasn't added and we need to recover this information, or nodesAttributes wasn't propogated to MS. h3. *Definition of done* On a node restart, all states, that were going to be updated during watch event in {{DistributionZoneManager#createMetastorageTopologyListener}} must be recovered h3. *Implementation notes* (outdated, see UPD) For every zone, compare {{MS.local.logicalTopology.revision}} with max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare {{MS.local.logicalTopology}} with {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. {{lastSeenTopology}} is calculated in the following way: we read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from {{MS.local.dataNodes}} and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with {{MS.local.logicalTopology}} will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and
[jira] [Updated] (IGNITE-20603) Restore logical topology change event on a node restart
[ https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20603: - Description: h3. *Motivation* It is possible that some events were propagated to {{ms.logicalTopology}}, but restart happened when we were updating topologyAugmentationMap and other states in {{DistributionZoneManager#createMetastorageTopologyListener}}. That means that augmentation that must be added to {{zone.topologyAugmentationMap}} wasn't added and we need to recover this information, or nodesAttributes wasn't propogated to MS. h3. *Definition of done* On a node restart, all states, that were going to be updated during watch event in {{DistributionZoneManager#createMetastorageTopologyListener}} must be recovered h3. *Implementation notes* (outdated, see UPD) For every zone, compare {{MS.local.logicalTopology.revision}} with max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare {{MS.local.logicalTopology}} with {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. {{lastSeenTopology}} is calculated in the following way: we read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from {{MS.local.dataNodes}} and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with {{MS.local.logicalTopology}} will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and add them to the topologyAugmentationMap. As a revision (key for topologyAugmentationMap) take {{MS.local.logicalTopology.revision}}. It is safe to take this revision, because if some node was added to the {{ms.topology}} after immediate data nodes recalculation, this added node must restore this immediate data nodes' recalculation intent. UPD: Implementation notes are outdated, we've implemented a bit different approach: now we save the last handled topology to MS, and on restart we restore global stated according to states from local metastorage and check if the current ms.logicalTopology differs from the one that was handled in DistributionZoneManager#createMetastorageTopologyListener (we check revision of this events), then we just repeat the logic from DistributionZoneManager#createMetastorageTopologyListener with the new logical topology from the ms.logicalTopology. was: h3. *Motivation* It is possible that some events were propagated to {{ms.logicalTopology}}, but restart happened when we were updating topologyAugmentationMap and other states in {{DistributionZoneManager#createMetastorageTopologyListener}}. That means that augmentation that must be added to {{zone.topologyAugmentationMap}} wasn't added and we need to recover this information, or nodesAttributes wasn't propogated to MS. h3. *Definition of done* On a node restart, all states, that were going to be updated during watch event in {{DistributionZoneManager#createMetastorageTopologyListener}} must be recovered h3. *Implementation notes* (outdated, see UPD) For every zone, compare {{MS.local.logicalTopology.revision}} with max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare {{MS.local.logicalTopology}} with {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. {{lastSeenTopology}} is calculated in the following way: we read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from {{MS.local.dataNodes}} and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with {{MS.local.logicalTopology}} will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and
[jira] [Updated] (IGNITE-20603) Restore logical topology change event on a node restart
[ https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20603: - Description: h3. *Motivation* It is possible that some events were propagated to {{ms.logicalTopology}}, but restart happened when we were updating topologyAugmentationMap and other states in {{DistributionZoneManager#createMetastorageTopologyListener}}. That means that augmentation that must be added to {{zone.topologyAugmentationMap}} wasn't added and we need to recover this information, or nodesAttributes wasn't propogated to MS. h3. *Definition of done* On a node restart, all states, that were going to be updated during watch event in {{DistributionZoneManager#createMetastorageTopologyListener}} must be recovered h3. *Implementation notes* (outdated, see UPD) For every zone, compare {{MS.local.logicalTopology.revision}} with max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare {{MS.local.logicalTopology}} with {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. {{lastSeenTopology}} is calculated in the following way: we read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from {{MS.local.dataNodes}} and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with {{MS.local.logicalTopology}} will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and add them to the topologyAugmentationMap. As a revision (key for topologyAugmentationMap) take {{MS.local.logicalTopology.revision}}. It is safe to take this revision, because if some node was added to the {{ms.topology}} after immediate data nodes recalculation, this added node must restore this immediate data nodes' recalculation intent. UPD: Implementation notes are outdated, we've implemented a bit different approach: now we save the last handled topology to MS, and on restart we check if the current ms.logicalTopology differs from the one that was handled in DistributionZoneManager#createMetastorageTopologyListener (we check revision of this events), then we just repeat the logic from DistributionZoneManager#createMetastorageTopologyListener with the new logical topology from the ms.logicalTopology. was: h3. *Motivation* It is possible that some events were propagated to {{ms.logicalTopology}}, but restart happened when we were updating topologyAugmentationMap in {{DistributionZoneManager#createMetastorageTopologyListener}}. That means that augmentation that must be added to {{zone.topologyAugmentationMap}} wasn't added and we need to recover this information. h3. *Definition of done* On a node restart, topologyAugmentationMap must be correctly restored according to {{ms.logicalTopology}} state. h3. *Implementation notes* (outdated, see UPD) For every zone, compare {{MS.local.logicalTopology.revision}} with max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is greater than max(maxScUpFromMap, maxScDownFromMap), that means that some topology changes haven't been propagated to topologyAugmentationMap before restart and appropriate timers haven't been scheduled. To fill the gap in topologyAugmentationMap, compare {{MS.local.logicalTopology}} with {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the nodes that did not have time to be propagated to topologyAugmentationMap before restart. {{lastSeenTopology}} is calculated in the following way: we read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, scaleDownTriggerKey) and retrieve all additions and removals of nodes from the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) as the left bound. After that apply these changes to the map with nodes counters from {{MS.local.dataNodes}} and take nodes only with the positive counters. This is the lastSeenTopology. Comparing it with {{MS.local.logicalTopology}} will tell us which nodes were not added or removed and weren't propagated to topologyAugmentationMap before restart. We take these differences and add them to the topologyAugmentationMap. As a revision (key for topologyAugmentationMap) take {{MS.local.logicalTopology.revision}}. It is safe to take this revision, because if
[jira] [Updated] (IGNITE-20603) Restore logical topology change event on a node restart
[ https://issues.apache.org/jira/browse/IGNITE-20603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20603: - Summary: Restore logical topology change event on a node restart (was: Restore topologyAugmentationMap on a node restart) > Restore logical topology change event on a node restart > --- > > Key: IGNITE-20603 > URL: https://issues.apache.org/jira/browse/IGNITE-20603 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Mirza Aliev >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > h3. *Motivation* > It is possible that some events were propagated to {{ms.logicalTopology}}, > but restart happened when we were updating topologyAugmentationMap in > {{DistributionZoneManager#createMetastorageTopologyListener}}. That means > that augmentation that must be added to {{zone.topologyAugmentationMap}} > wasn't added and we need to recover this information. > h3. *Definition of done* > On a node restart, topologyAugmentationMap must be correctly restored > according to {{ms.logicalTopology}} state. > h3. *Implementation notes* > (outdated, see UPD) > For every zone, compare {{MS.local.logicalTopology.revision}} with > max(maxScUpFromMap, maxScDownFromMap). If {{logicalTopology.revision}} is > greater than max(maxScUpFromMap, maxScDownFromMap), that means that some > topology changes haven't been propagated to topologyAugmentationMap before > restart and appropriate timers haven't been scheduled. To fill the gap in > topologyAugmentationMap, compare {{MS.local.logicalTopology}} with > {{lastSeenLogicalTopology}} and enhance topologyAugmentationMap with the > nodes that did not have time to be propagated to topologyAugmentationMap > before restart. {{lastSeenTopology}} is calculated in the following way: we > read {{MS.local.dataNodes}}, also we take max(scaleUpTriggerKey, > scaleDownTriggerKey) and retrieve all additions and removals of nodes from > the topologyAugmentationMap using max(scaleUpTriggerKey, scaleDownTriggerKey) > as the left bound. After that apply these changes to the map with nodes > counters from {{MS.local.dataNodes}} and take nodes only with the positive > counters. This is the lastSeenTopology. Comparing it with > {{MS.local.logicalTopology}} will tell us which nodes were not added or > removed and weren't propagated to topologyAugmentationMap before restart. We > take these differences and add them to the topologyAugmentationMap. As a > revision (key for topologyAugmentationMap) take > {{MS.local.logicalTopology.revision}}. It is safe to take this revision, > because if some node was added to the {{ms.topology}} after immediate data > nodes recalculation, this added node must restore this immediate data nodes' > recalculation intent. > UPD: Implementation notes are outdated, we've implemented a bit different > approach: now we save the last handled topology to MS, and on restart we > check if the current ms.logicalTopology differs from the one that was handled > in DistributionZoneManager#createMetastorageTopologyListener (we check > revision of this events), then we just repeat the logic from > DistributionZoneManager#createMetastorageTopologyListener with the new > logical topology from the ms.logicalTopology. -- This message was sent by Atlassian Jira (v8.20.10#820010)