[jira] [Updated] (IGNITE-21313) Incorrect behaviour when invalid zone filter is applied to zone

2024-01-19 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-21313:
---
Description: 
Let's consider this code to be run in a test:

 
{code:java}
sql("CREATE ZONE ZONE1 WITH DATA_NODES_FILTER = 'INCORRECT_FILTER'");
sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT) WITH 
PRIMARY_ZONE='ZONE1'"); {code}
 Current behaviour is that test hangs with spamming 

 
{noformat}
[2024-01-19T12:56:25,163][ERROR][%ictdt_n_0%metastorage-watch-executor-2][WatchProcessor]
 Error occurred when notifying safe time advanced callback
 java.util.concurrent.CompletionException: 
com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']
    at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
 [?:?]
    at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2257)
 [?:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.notifyWatches(WatchProcessor.java:213)
 ~[main/:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$3(WatchProcessor.java:169)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']{noformat}
 

We need to fix that and formulate a reaction to an incorrect filter

 

*Implementation notes:*

To fix it we need to change implementation of DistributionZonesUtil#filter.

Instead of 
{code:java}
List> res = JsonPath.read(convertedAttributes, 
filter);{code}
need to use
{code:java}
Configuration configuration = new Configuration.ConfigurationBuilder()
.options(Option.SUPPRESS_EXCEPTIONS, Option.ALWAYS_RETURN_LIST)
.build();

List> res = 
JsonPath.using(configuration).parse(convertedAttributes).read(filter);{code}
In this case incorrect filter will not throw PathNotFoundException and returns 
empty 'res'.

  was:
Let's consider this code to be run in a test:

 
{code:java}
sql("CREATE ZONE ZONE1 WITH DATA_NODES_FILTER = 'INCORRECT_FILTER'");
sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT) WITH 
PRIMARY_ZONE='ZONE1'"); {code}
 Current behaviour is that test hangs with spamming 

 
{noformat}
[2024-01-19T12:56:25,163][ERROR][%ictdt_n_0%metastorage-watch-executor-2][WatchProcessor]
 Error occurred when notifying safe time advanced callback
 java.util.concurrent.CompletionException: 
com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']
    at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
 [?:?]
    at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2257)
 [?:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.notifyWatches(WatchProcessor.java:213)
 ~[main/:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$3(WatchProcessor.java:169)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']{noformat}
 

We need to fix that and formulate a reaction to an incorrect filter

 

*Implementation notes:*

To fix it we need to change DistributionZonesUtil#filter implementation.

Instead of 
{code:java}
List> res = JsonPath.read(convertedAttributes, 
filter);{code}
need to use
{code:java}
Configuration configuration = new Configuration.ConfigurationBuilder()
.options(Option.SUPPRESS_EXCEPTIONS, 

[jira] [Updated] (IGNITE-21313) Incorrect behaviour when invalid zone filter is applied to zone

2024-01-19 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-21313:
---
Description: 
Let's consider this code to be run in a test:

 
{code:java}
sql("CREATE ZONE ZONE1 WITH DATA_NODES_FILTER = 'INCORRECT_FILTER'");
sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT) WITH 
PRIMARY_ZONE='ZONE1'"); {code}
 Current behaviour is that test hangs with spamming 

 
{noformat}
[2024-01-19T12:56:25,163][ERROR][%ictdt_n_0%metastorage-watch-executor-2][WatchProcessor]
 Error occurred when notifying safe time advanced callback
 java.util.concurrent.CompletionException: 
com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']
    at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
 [?:?]
    at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2257)
 [?:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.notifyWatches(WatchProcessor.java:213)
 ~[main/:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$3(WatchProcessor.java:169)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']{noformat}
 

We need to fix that and formulate a reaction to an incorrect filter

 

*Implementation notes:*

To fix it we need to change DistributionZonesUtil#filter implementation.

Instead of 
{code:java}
List> res = JsonPath.read(convertedAttributes, 
filter);{code}
need to use
{code:java}
Configuration configuration = new Configuration.ConfigurationBuilder()
.options(Option.SUPPRESS_EXCEPTIONS, Option.ALWAYS_RETURN_LIST)
.build();

List> res = 
JsonPath.using(configuration).parse(convertedAttributes).read(filter);{code}
In this case incorrect filter will not throw PathNotFoundException and returns 
empty 'res'.

 

  was:
Let's consider this code to be run in a test:

 
{code:java}
sql("CREATE ZONE ZONE1 WITH DATA_NODES_FILTER = 'INCORRECT_FILTER'");
sql("CREATE TABLE TEST(ID INT PRIMARY KEY, VAL0 INT) WITH 
PRIMARY_ZONE='ZONE1'"); {code}
 Current behaviour is that test hangs with spamming 

 
{noformat}
[2024-01-19T12:56:25,163][ERROR][%ictdt_n_0%metastorage-watch-executor-2][WatchProcessor]
 Error occurred when notifying safe time advanced callback
 java.util.concurrent.CompletionException: 
com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']
    at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
 [?:?]
    at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2257)
 [?:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.notifyWatches(WatchProcessor.java:213)
 ~[main/:?]
    at 
org.apache.ignite.internal.metastorage.server.WatchProcessor.lambda$notifyWatches$3(WatchProcessor.java:169)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.jayway.jsonpath.PathNotFoundException: No results for path: 
$['INCORRECT_FILTER']{noformat}
 

We need to fix that and formulate a reaction to an incorrect filter  


> Incorrect behaviour when invalid zone filter is applied to zone 
> 
>
> Key: IGNITE-21313
> URL: https://issues.apache.org/jira/browse/IGNITE-21313
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza 

[jira] [Updated] (IGNITE-20412) Fix DistributionZoneCausalityDataNodesTest.java#checkDataNodesRepeated

2023-12-14 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20412:
---
Summary: Fix 
DistributionZoneCausalityDataNodesTest.java#checkDataNodesRepeated  (was: Fix 
ItIgniteDistributionZoneManagerNodeRestartTest# 
testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart)

> Fix DistributionZoneCausalityDataNodesTest.java#checkDataNodesRepeated
> --
>
> Key: IGNITE-20412
> URL: https://issues.apache.org/jira/browse/IGNITE-20412
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
>  started to fall in the catalog-feature branch and fails in the main branch 
> after catalog-feature is merged
> [https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=]
> {code:java}
> java.lang.AssertionError:
> Expected: is <[]>
>  but: was <[A]>
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
> at 
> org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
> at 
> org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
> {code}
> h3. Implementation notes
> The root cause:
>  # This test changes metaStorageManager behavior and it throws expected 
> exception on ms.invoke.
>  # The test alters zone with new filter.
>  # DistributionZoneManager#onUpdateFilter return a future from 
> saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
>  # The future is completed exceptionally and 
> WatchProcessor#notificationFuture will be completed exceptionally.
>  # Next updates will not be handled properly because notificationFuture is 
> completed exceptionally.
> We have already created tickets obout exception handling:
>  * https://issues.apache.org/jira/browse/IGNITE-14693
>  * https://issues.apache.org/jira/browse/IGNITE-14611
>  
> The test scenario is incorrect because the node should be stopped (by failure 
> handler) if the ms.invoke failed. We need to rewrite it when the DZM restart 
> will be updated.
> UPD1:
> I've tried to rewrite test, so we could not throw exception in metastorage 
> handler, but just force thread to wait in this invoke, but this lead the to 
> the problem that because we use spy on Standalone Metastorage, and mockito 
> use synchronised block when we call ms.invoke, so that leads to the problem 
> that blocking of one invoke leads to blocking all other communication with ms.
> Need further investigation how to rewrite this test
>  
> UPD2:
> The testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart 
> test was removed under another commit. But there is another test which is 
> disabled by this ticket. And it is fixed now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20412) Fix ItIgniteDistributionZoneManagerNodeRestartTest# testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart

2023-12-14 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20412:
---
Description: 
h3. Motivation

org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
 started to fall in the catalog-feature branch and fails in the main branch 
after catalog-feature is merged

[https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=]
{code:java}
java.lang.AssertionError:
Expected: is <[]>
 but: was <[A]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
at 
org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
{code}
h3. Implementation notes

The root cause:
 # This test changes metaStorageManager behavior and it throws expected 
exception on ms.invoke.
 # The test alters zone with new filter.
 # DistributionZoneManager#onUpdateFilter return a future from 
saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
 # The future is completed exceptionally and WatchProcessor#notificationFuture 
will be completed exceptionally.
 # Next updates will not be handled properly because notificationFuture is 
completed exceptionally.

We have already created tickets obout exception handling:
 * https://issues.apache.org/jira/browse/IGNITE-14693
 * https://issues.apache.org/jira/browse/IGNITE-14611

 

The test scenario is incorrect because the node should be stopped (by failure 
handler) if the ms.invoke failed. We need to rewrite it when the DZM restart 
will be updated.

UPD1:

I've tried to rewrite test, so we could not throw exception in metastorage 
handler, but just force thread to wait in this invoke, but this lead the to the 
problem that because we use spy on Standalone Metastorage, and mockito use 
synchronised block when we call ms.invoke, so that leads to the problem that 
blocking of one invoke leads to blocking all other communication with ms.

Need further investigation how to rewrite this test

 

UPD2:

The testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart test 
was removed under another commit. But there is another test which is disabled 
by this ticket. And it is fixed now.

  was:
h3. Motivation

org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
 started to fall in the catalog-feature branch and fails in the main branch 
after catalog-feature is merged

[https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=]
{code:java}
java.lang.AssertionError:
Expected: is <[]>
 but: was <[A]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
at 
org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
{code}
h3. Implementation notes

The root cause:
 # This test changes metaStorageManager behavior and it throws expected 
exception on ms.invoke.
 # The test alters zone with new filter.
 # DistributionZoneManager#onUpdateFilter return a future from 
saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
 # The future is completed exceptionally and WatchProcessor#notificationFuture 
will be completed exceptionally.
 # Next updates will not be handled properly because notificationFuture is 
completed exceptionally.

We have already created tickets obout exception handling:
 * https://issues.apache.org/jira/browse/IGNITE-14693
 * https://issues.apache.org/jira/browse/IGNITE-14611

 

The test scenario is incorrect because the node should be stopped (by failure 
handler) if the ms.invoke failed. We need to rewrite it when the DZM restart 
will be updated.

UPD1:

I've tried to rewrite test, so we could not throw exception in metastorage 
handler, but just force thread to wait in this invoke, but this lead the to the 
problem that because we use spy on Standalone Metastorage, and mockito use 
synchronised block when we call ms.invoke, so that leads to the problem that 
blocking of one invoke leads to blocking all other communication with ms.

Need further investigation how to rewrite this test

 


[jira] [Updated] (IGNITE-20412) Fix ItIgniteDistributionZoneManagerNodeRestartTest# testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart

2023-12-14 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20412:
---
Description: 
h3. Motivation

org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
 started to fall in the catalog-feature branch and fails in the main branch 
after catalog-feature is merged

[https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=]
{code:java}
java.lang.AssertionError:
Expected: is <[]>
 but: was <[A]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
at 
org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
{code}
h3. Implementation notes

The root cause:
 # This test changes metaStorageManager behavior and it throws expected 
exception on ms.invoke.
 # The test alters zone with new filter.
 # DistributionZoneManager#onUpdateFilter return a future from 
saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
 # The future is completed exceptionally and WatchProcessor#notificationFuture 
will be completed exceptionally.
 # Next updates will not be handled properly because notificationFuture is 
completed exceptionally.

We have already created tickets obout exception handling:
 * https://issues.apache.org/jira/browse/IGNITE-14693
 * https://issues.apache.org/jira/browse/IGNITE-14611

 

The test scenario is incorrect because the node should be stopped (by failure 
handler) if the ms.invoke failed. We need to rewrite it when the DZM restart 
will be updated.

UPD1:

I've tried to rewrite test, so we could not throw exception in metastorage 
handler, but just force thread to wait in this invoke, but this lead the to the 
problem that because we use spy on Standalone Metastorage, and mockito use 
synchronised block when we call ms.invoke, so that leads to the problem that 
blocking of one invoke leads to blocking all other communication with ms.

Need further investigation how to rewrite this test

 

UPD2:

 

  was:
h3. Motivation

org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
 started to fall in the catalog-feature branch and fails in the main branch 
after catalog-feature is merged

[https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=]
{code:java}
java.lang.AssertionError:
Expected: is <[]>
 but: was <[A]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
at 
org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
{code}
h3. Implementation notes

The root cause:
 # This test changes metaStorageManager behavior and it throws expected 
exception on ms.invoke.
 # The test alters zone with new filter.
 # DistributionZoneManager#onUpdateFilter return a future from 
saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
 # The future is completed exceptionally and WatchProcessor#notificationFuture 
will be completed exceptionally.
 # Next updates will not be handled properly because notificationFuture is 
completed exceptionally.

We have already created tickets obout exception handling:
 * https://issues.apache.org/jira/browse/IGNITE-14693
 * https://issues.apache.org/jira/browse/IGNITE-14611

 

The test scenario is incorrect because the node should be stopped (by failure 
handler) if the ms.invoke failed. We need to rewrite it when the DZM restart 
will be updated.


UPD1:

I've tried to rewrite test, so we could not throw exception in metastorage 
handler, but just force thread to wait in this invoke, but this lead the to the 
problem that because we use spy on Standalone Metastorage, and mockito use 
synchronised block when we call ms.invoke, so that leads to the problem that 
blocking of one invoke leads to blocking all other communication with ms.

Need further investigation how to rewrite this test 


> Fix ItIgniteDistributionZoneManagerNodeRestartTest# 
> testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
> 

[jira] [Commented] (IGNITE-19955) Fix create zone on restart rewrites existing data nodes because of trigger key inconsistnecy

2023-12-14 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-19955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796615#comment-17796615
 ] 

Sergey Uttsel commented on IGNITE-19955:


LGTM

> Fix create zone on restart rewrites existing data nodes because of trigger 
> key inconsistnecy
> 
>
> Key: IGNITE-19955
> URL: https://issues.apache.org/jira/browse/IGNITE-19955
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Outdated, see UPD
> Currently we have the logic for initialisation of newly created zone that it 
> writes keys
> {noformat}
> zoneDataNodesKey(zoneId), zoneScaleUpChangeTriggerKey(zoneId), 
> zoneScaleDownChangeTriggerKey(zoneId), zonesChangeTriggerKey(zoneId)
> {noformat}
> to metastorage, and condition is 
> {noformat}
> static CompoundCondition triggerKeyConditionForZonesChanges(long 
> revision, int zoneId) {
> return or(
> notExists(zonesChangeTriggerKey(zoneId)),
> 
> value(zonesChangeTriggerKey(zoneId)).lt(ByteUtils.longToBytes(revision))
> );
> {noformat}
> Recovery process implies that the create zone event will be processed again, 
> but with the higher revision, so data nodes will be rewritten.
> We need to handle this situation, so data nodes will be consistent after 
> restart.
> Possible solution is to change condition to 
> {noformat}
> static SimpleCondition triggerKeyConditionForZonesCreation(long revision, 
> int zoneId) {
> return notExists(zonesChangeTriggerKey(zoneId));
> }
> static SimpleCondition triggerKeyConditionForZonesDelete(int zoneId) {
> return exists(zonesChangeTriggerKey(zoneId));
> }
> {noformat}
>  
> so we could not rely on revision and check only existence of the key, when we 
> create or remove zone. The problem in this solution is that reordering of the 
> create and remove on some node could lead to not consistent state for zones 
> key in metastorage
> *UPD*:
> This problem will be resolves once we implement 
> https://issues.apache.org/jira/browse/IGNITE-20561
> In this ticket we need to unmute all tickets that were muted by this ticket



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20605) Restore scaleUp/scaleDown timers

2023-12-01 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791980#comment-17791980
 ] 

Sergey Uttsel commented on IGNITE-20605:


LGTM

> Restore scaleUp/scaleDown timers
> 
>
> Key: IGNITE-20605
> URL: https://issues.apache.org/jira/browse/IGNITE-20605
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> We need to restore timers that were scheduled before node restart.
> h3. *Definition of done*
> Timers are rescheduled after restart
> h3. *Implementation notes*
> It is valid to just schedule local timers according to scaleUp/ScaleDown 
> timers values from the Catalog, and as a revision take maxScUpFromMap or 
> maxScDownFromMap from topologyAugmentationMap, where maxScUpFromMap and 
> maxScDownFromMap are max revision from topologyAugmentationMap of the entry, 
> which was associated with addition and removal of nodes respectively



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20939) Extract Distribution zones integration tests from runner module to separate one

2023-11-28 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20939:
--

Assignee: Sergey Uttsel

> Extract Distribution zones integration tests from runner module to separate 
> one
> ---
>
> Key: IGNITE-20939
> URL: https://issues.apache.org/jira/browse/IGNITE-20939
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Runner module incorporate big amount of tests related to another modules. 
> That's lead to long running time for the suite on TC.
> Currently, integration tests for Distribution Zones located in ignite-runner 
> module. So, we need to extract it to the separate module via runner 
> test-fixtures support to decrease the execution time of tests for the runner 
> module.
> As reference for such activities could be used IGNITE-20670



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20559) Return metastorage invokes in DistributionZoneManager#createMetastorageTopologyListener

2023-11-14 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785924#comment-17785924
 ] 

Sergey Uttsel commented on IGNITE-20559:


LGTM

> Return metastorage invokes in 
> DistributionZoneManager#createMetastorageTopologyListener
> ---
>
> Key: IGNITE-20559
> URL: https://issues.apache.org/jira/browse/IGNITE-20559
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # LogicalTopologyEventListener to update logical topology.
> Also we need to save {{nodeAttriburtes}} and {{topologyAugmentationMap}} in MS
> h3. *Definition of Done*
> Need to ensure event handling linearization. All immediate data nodes 
> recalculation must be returned  to the event handler. Also 
> {{nodeAttriburtes}} and {{topologyAugmentationMap}} in must be saved in MS, 
> so we can use this fields when recovery DZM
> h3. *Implementation notes*
> When topology update is handled (createMetastorageTopologyListener), 
> immediately recalculate data nodes within caller handler for all zones, which 
> have immediate timer. Also within the caller handler, write nodesAttributes 
> and topologyAugmentationMaps to metastore. Only after completion of this ms 
> invoke, schedule local timers. All futures of these changes must be returned 
> as a result of the watch listener update, so this update could be marked as 
> processed only after all above mentioned actions are completed.
> For CAS-ing nodesAttributes and topologyAugmentationMaps, we can reuse 
> DistributionZonesUtil#zonesGlobalStateRevision, but change it from vault key 
> to MS and make it per zone. Every time we try to save these keys, we will 
> take revision of topology update (topRev) and will try to write changes with 
> condition topRev >  ms.zonesGlobalStateRevision
> To clean up this map, we can remove all augmentations that are less than 
> min(scaleUpTriggerKeys, scaleDownTriggerKeys) when we write the new one.
> Further optimization: we can save only one topologyAugmentationMap, the only 
> question is how to clean up the map. We can find minimal 
> min(scaleUPTriggerKeys, scaleDownTriggerKeys) among all zones and clean up 
> all up to this minimum.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-16431) Entry expiration requires twice the entry size of heap

2023-11-08 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-16431:
--

Assignee: Sergey Uttsel

> Entry expiration requires twice the entry size of heap
> --
>
> Key: IGNITE-16431
> URL: https://issues.apache.org/jira/browse/IGNITE-16431
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.12
>Reporter: Alexey Kukushkin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: cggg
> Attachments: 500MB-put-expiry-master.png
>
>   Original Estimate: 64h
>  Remaining Estimate: 64h
>
> Ignite takes twice the entry size off the heap to expire the entry when 
> {{{}eagerTtl=true{}}}. See the attached heap memory usage diagram of putting 
> and then expiring a 500MB entry in Ignite. 
> This makes Ignite inefficient with handling large objects causing 
> {{OutOfMemory}} errors.
> Do we really need loading entry's value on heap at all to expiry the entry? 
> Please enhance Ignite cache entry expiration not to load the entry's value on 
> heap even once or explain why it is not possible.
> !500MB-put-expiry-master.png!  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-20160) NullPointerException in FSMCallerImpl.doCommitted

2023-11-07 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel resolved IGNITE-20160.

Resolution: Duplicate

Fixed in https://issues.apache.org/jira/browse/IGNITE-20774

> NullPointerException in FSMCallerImpl.doCommitted
> -
>
> Key: IGNITE-20160
> URL: https://issues.apache.org/jira/browse/IGNITE-20160
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Sergey Uttsel
>Priority: Blocker
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:496)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:448)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:136)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:130)
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:226)
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
> at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
> {code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRunnerSqlLogic/7410174?hideProblemsFromDependencies=false=false=true=true]
> It happens here (see FSMCallerImpl#doCommitted):
> {code:java}
> final IteratorImpl iterImpl = new IteratorImpl(this.fsm, this.logManager, 
> closures, firstClosureIndex,
> lastAppliedIndex, committedIndex, this.applyingIndex, 
> this.node.getOptions());{code}
> on the 2nd line, most likely on resolving null pointer to *node,* which is 
> nullified on FSMCaller shutdown. Raft groups were being stopped in this 
> moment.
> *Implementation details*
> A simple fix to avoid the NPE at the aforementioned line would be to check 
> `node` for null. 
> Additionally it would be nice to check `shutdownLatch` in `doCommitted` and 
> make sure we call `unsubscribe` in a proper order.
> The reason is that doCommitted is called from a disruptor's callback. 
> One more observation - the reference to a node is set to null in `shutdown` 
> that is called before `join` where we unsubscribe from notifications from 
> Disruptor. There is a little chance that something comes into FSMCallerImpl 
> after shutdown but before join.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-10-13 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774956#comment-17774956
 ] 

Sergey Uttsel commented on IGNITE-20317:


Now LGTM

> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # DistributionZoneManager#onUpdateScaleUp
> # DistributionZoneManager#onUpdateScaleDown
> -DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
> replicas update.-
> -LogicalTopologyEventListener to update logical topology.-
> -DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
> watch listener to update pending assignments.-
> h3. *Definition of Done*
> Need to ensure event handling linearization. All immediate data nodes 
> recalculation must be returned  to the event handler.
> h3. *Implementation Notes*
> * ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
> DistributionZoneManager#onUpdateFilter and 
> DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> * We cannnot return future from LogicalTopologyEventListener's methods. We 
> can ignore these futures. It has drawback: we can skip the topology update
> # topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
> # Node C was joined to the topology and left quickly and ms invokes to update 
> topology entry was reordered.
> # data nodes was not updated immediately to [A,B,C].
> We think that we can ignore this bug because eventually it doesn't break the 
> consistency of the date node. For this purpose we need to change the invoke 
> condition:
> `value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
>  instead of
> `value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() 
> - 1))`
> * Need to return ms invoke futures from WatchListener#onUpdate method of the 
> data nodes listener.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-10-13 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774878#comment-17774878
 ] 

Sergey Uttsel edited comment on IGNITE-20317 at 10/13/23 11:45 AM:
---

Some action points of the ticket were not implemented:

 
{code:java}
LogicalTopologyEventListener to update logical topology.
we need to change the invoke condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`
{code}
 
{code:java}
DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener watch 
listener to update pending assignments.{code}


was (Author: sergey uttsel):
Some action points of the ticket were not implemented:
 # LogicalTopologyEventListener to update logical topology.
 # DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # DistributionZoneManager#onUpdateScaleUp
> # DistributionZoneManager#onUpdateScaleDown
> -DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
> replicas update.-
> -LogicalTopologyEventListener to update logical topology.-
> -DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
> watch listener to update pending assignments.-
> h3. *Definition of Done*
> Need to ensure event handling linearization. All immediate data nodes 
> recalculation must be returned  to the event handler.
> h3. *Implementation Notes*
> * ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
> DistributionZoneManager#onUpdateFilter and 
> DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> * We cannnot return future from LogicalTopologyEventListener's methods. We 
> can ignore these futures. It has drawback: we can skip the topology update
> # topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
> # Node C was joined to the topology and left quickly and ms invokes to update 
> topology entry was reordered.
> # data nodes was not updated immediately to [A,B,C].
> We think that we can ignore this bug because eventually it doesn't break the 
> consistency of the date node. For this purpose we need to change the invoke 
> condition:
> `value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
>  instead of
> `value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() 
> - 1))`
> * Need to return ms invoke futures from WatchListener#onUpdate method of the 
> data nodes listener.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-10-13 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774878#comment-17774878
 ] 

Sergey Uttsel edited comment on IGNITE-20317 at 10/13/23 11:45 AM:
---

Some action points of the ticket were not implemented:
{code:java}
LogicalTopologyEventListener to update logical topology.
we need to change the invoke condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`
{code}
{code:java}
DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener watch 
listener to update pending assignments.{code}


was (Author: sergey uttsel):
Some action points of the ticket were not implemented:

 
{code:java}
LogicalTopologyEventListener to update logical topology.
we need to change the invoke condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`
{code}
 
{code:java}
DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener watch 
listener to update pending assignments.{code}

> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # DistributionZoneManager#onUpdateScaleUp
> # DistributionZoneManager#onUpdateScaleDown
> -DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
> replicas update.-
> -LogicalTopologyEventListener to update logical topology.-
> -DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
> watch listener to update pending assignments.-
> h3. *Definition of Done*
> Need to ensure event handling linearization. All immediate data nodes 
> recalculation must be returned  to the event handler.
> h3. *Implementation Notes*
> * ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
> DistributionZoneManager#onUpdateFilter and 
> DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> * We cannnot return future from LogicalTopologyEventListener's methods. We 
> can ignore these futures. It has drawback: we can skip the topology update
> # topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
> # Node C was joined to the topology and left quickly and ms invokes to update 
> topology entry was reordered.
> # data nodes was not updated immediately to [A,B,C].
> We think that we can ignore this bug because eventually it doesn't break the 
> consistency of the date node. For this purpose we need to change the invoke 
> condition:
> `value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
>  instead of
> `value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() 
> - 1))`
> * Need to return ms invoke futures from WatchListener#onUpdate method of the 
> data nodes listener.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-10-13 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774878#comment-17774878
 ] 

Sergey Uttsel commented on IGNITE-20317:


Some action points of the ticket were not implemented:
 # LogicalTopologyEventListener to update logical topology.
 # DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # DistributionZoneManager#onUpdateScaleUp
> # DistributionZoneManager#onUpdateScaleDown
> -DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
> replicas update.-
> -LogicalTopologyEventListener to update logical topology.-
> -DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
> watch listener to update pending assignments.-
> h3. *Definition of Done*
> Need to ensure event handling linearization. All immediate data nodes 
> recalculation must be returned  to the event handler.
> h3. *Implementation Notes*
> * ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
> DistributionZoneManager#onUpdateFilter and 
> DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> * We cannnot return future from LogicalTopologyEventListener's methods. We 
> can ignore these futures. It has drawback: we can skip the topology update
> # topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
> # Node C was joined to the topology and left quickly and ms invokes to update 
> topology entry was reordered.
> # data nodes was not updated immediately to [A,B,C].
> We think that we can ignore this bug because eventually it doesn't break the 
> consistency of the date node. For this purpose we need to change the invoke 
> condition:
> `value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
>  instead of
> `value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() 
> - 1))`
> -* Need to return ms invoke futures from WatchListener#onUpdate method of the 
> data nodes listener.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-20599) Implement a 'not' operation in the meta storage dsl.

2023-10-09 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20599:
--

 Summary: Implement a 'not' operation in the meta storage dsl.
 Key: IGNITE-20599
 URL: https://issues.apache.org/jira/browse/IGNITE-20599
 Project: Ignite
  Issue Type: Improvement
Reporter: Sergey Uttsel


*Motivation*

In https://issues.apache.org/jira/browse/IGNITE-20561 we need to create a 
condition for a ms invoke with negation. We could do this two ways:
{code:java}
and(
    notExists(dataNodes(zoneId)),
    notTombstone(dataNodes(zoneId))
){code}
or
{code:java}
not(
    or(
        exists(dataNodes(zoneId)),
        tombstone(dataNodes(zoneId))
    )
){code}
But there are no `notTombstone` or `not` methods in the meta storage dsl.

 

I propose to implement `not` operation because it is more general approach and 
it can be reused with other conditions.

*Definition of done*

`not` operation  is implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20561) Change condition for DistributionZonesUtil#triggerKeyConditionForZonesChanges to use ConditionType#TOMBSTONE

2023-10-09 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20561:
---
Description: 
*Motivation*
Currently we use zonesChangeTriggerKey in 
DistributionZonesUtil#triggerKeyConditionForZonesChanges to create condition to 
initialize the zone's meta storage keys on a zone creation and to remove these 
keys on a zone drop. It cause some issues:
 # we cannot remove zonesChangeTriggerKey to ensure that on the zone will not 
recreated on DZM restart
 # it doesn't work properly now because it possible that on DZM restart the 
zone will be recreated with the revision which is higher than original the zone 
create revision.

*Implementation notes*
To fix it we need to get rid of zonesChangeTriggerKey and use a dataNodes ms 
key on a zone create and a zone drop.
so the condition for a zone creation will be:
{code:java}
and(
notExists(dataNodes(zoneId)),
notTombstone(dataNodes(zoneId))
){code}

and for a zone drop:
{code:java}
exists(dataNodes(zoneId)){code}
 
*Definition of done*
Got rid of the meta storage zonesChangeTriggerKey key.

  was:
*Motivation*
Currently we use zonesChangeTriggerKey in 
DistributionZonesUtil#triggerKeyConditionForZonesChanges to create condition to 
initialize the zone's meta storage keys on a zone creation and to remove these 
keys on a zone drop. It cause some issues: # we cannot remove 
zonesChangeTriggerKey to ensure that on the zone will not recreated on DZM 
restart
 # it doesn't work properly now because it possible that on DZM restart the 
zone will be recreated with the revision which is higher than original the zone 
create revision.

 
*Implementation notes*
To fix it we need to get rid of zonesChangeTriggerKey and use a dataNodes ms 
key on a zone create and a zone drop.
so the condition for a zone creation will be:
and(
notExists(dataNodes(zoneId)),
notTombstone(dataNodes(zoneId))
)
and for a zone drop:
exists(dataNodes(zoneId))
 
*Definition of done*
Got rid of the meta storage zonesChangeTriggerKey key.


> Change condition for DistributionZonesUtil#triggerKeyConditionForZonesChanges 
> to use  ConditionType#TOMBSTONE
> -
>
> Key: IGNITE-20561
> URL: https://issues.apache.org/jira/browse/IGNITE-20561
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> Currently we use zonesChangeTriggerKey in 
> DistributionZonesUtil#triggerKeyConditionForZonesChanges to create condition 
> to initialize the zone's meta storage keys on a zone creation and to remove 
> these keys on a zone drop. It cause some issues:
>  # we cannot remove zonesChangeTriggerKey to ensure that on the zone will not 
> recreated on DZM restart
>  # it doesn't work properly now because it possible that on DZM restart the 
> zone will be recreated with the revision which is higher than original the 
> zone create revision.
> *Implementation notes*
> To fix it we need to get rid of zonesChangeTriggerKey and use a dataNodes ms 
> key on a zone create and a zone drop.
> so the condition for a zone creation will be:
> {code:java}
> and(
> notExists(dataNodes(zoneId)),
> notTombstone(dataNodes(zoneId))
> ){code}
> and for a zone drop:
> {code:java}
> exists(dataNodes(zoneId)){code}
>  
> *Definition of done*
> Got rid of the meta storage zonesChangeTriggerKey key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20561) Change condition for DistributionZonesUtil#triggerKeyConditionForZonesChanges to use ConditionType#TOMBSTONE

2023-10-09 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20561:
---
Description: 
*Motivation*
Currently we use zonesChangeTriggerKey in 
DistributionZonesUtil#triggerKeyConditionForZonesChanges to create condition to 
initialize the zone's meta storage keys on a zone creation and to remove these 
keys on a zone drop. It cause some issues: # we cannot remove 
zonesChangeTriggerKey to ensure that on the zone will not recreated on DZM 
restart
 # it doesn't work properly now because it possible that on DZM restart the 
zone will be recreated with the revision which is higher than original the zone 
create revision.

 
*Implementation notes*
To fix it we need to get rid of zonesChangeTriggerKey and use a dataNodes ms 
key on a zone create and a zone drop.
so the condition for a zone creation will be:
and(
notExists(dataNodes(zoneId)),
notTombstone(dataNodes(zoneId))
)
and for a zone drop:
exists(dataNodes(zoneId))
 
*Definition of done*
Got rid of the meta storage zonesChangeTriggerKey key.

  was:We need to use {{ConditionType#TOMBSTONE}} in 
{{DistributionZonesUtil#triggerKeyConditionForZonesChanges}} when we initialise 
keys for zones in MS


> Change condition for DistributionZonesUtil#triggerKeyConditionForZonesChanges 
> to use  ConditionType#TOMBSTONE
> -
>
> Key: IGNITE-20561
> URL: https://issues.apache.org/jira/browse/IGNITE-20561
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> Currently we use zonesChangeTriggerKey in 
> DistributionZonesUtil#triggerKeyConditionForZonesChanges to create condition 
> to initialize the zone's meta storage keys on a zone creation and to remove 
> these keys on a zone drop. It cause some issues: # we cannot remove 
> zonesChangeTriggerKey to ensure that on the zone will not recreated on DZM 
> restart
>  # it doesn't work properly now because it possible that on DZM restart the 
> zone will be recreated with the revision which is higher than original the 
> zone create revision.
>  
> *Implementation notes*
> To fix it we need to get rid of zonesChangeTriggerKey and use a dataNodes ms 
> key on a zone create and a zone drop.
> so the condition for a zone creation will be:
> and(
> notExists(dataNodes(zoneId)),
> notTombstone(dataNodes(zoneId))
> )
> and for a zone drop:
> exists(dataNodes(zoneId))
>  
> *Definition of done*
> Got rid of the meta storage zonesChangeTriggerKey key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20536) No-op handlers for StripedDisruptor.StripeEntryHandler#subscribers

2023-10-06 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20536:
---
Description: 
h3. Motivation

In https://issues.apache.org/jira/browse/IGNITE-20397 we discussed that it is 
possible to get null handler in StripedDisruptor.StripeEntryHandler#onEvent on 
a table drop. And we start to use a log warning instead of an assert.

But this is not the best solution. We still need to assert that handler is not 
null on first event for the partition. And we need to skip events if the 
partition was removed. So we need:
 # to add assert that `handler != null`,
 # on StripedDisruptor.StripeEntryHandler#unsubscribe put a no-op handler to a 
subscribers map instead of remove it,
 # to remove the no-op handler when there are no events for this handler.

h3. Definition of done:
 # assert that `handler != null` is added,
 # no-op handler on StripedDisruptor.StripeEntryHandler#unsubscribe,
 # remove handler when it is not needed

  was:
h3. Motivation

In https://issues.apache.org/jira/browse/IGNITE-20397 we discussed that it is 
possible to get null handler in StripedDisruptor.StripeEntryHandler#onEvent on 
a table drop. And we start to use a log warning instead of an assert.

But this is not the best solution. We still need to assert that handler is not 
null on first event for the partition. And we need to skip events if the 
partition was removed. So we need:
 # to add assert that `handler != null`,
 # on StripedDisruptor.StripeEntryHandler#unsubscribe put a no-op handler to a 
subscribers map instead of remove it,
 # to remove the no-op handler when there are no events for this handler.

h3. Definition of done:
 # assert that `handler != null` is added,
 # no-op handler on StripedDisruptor.StripeEntryHandler#unsubscribe,
 # remove handler when it is not needed.


> No-op handlers for StripedDisruptor.StripeEntryHandler#subscribers
> --
>
> Key: IGNITE-20536
> URL: https://issues.apache.org/jira/browse/IGNITE-20536
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In https://issues.apache.org/jira/browse/IGNITE-20397 we discussed that it is 
> possible to get null handler in StripedDisruptor.StripeEntryHandler#onEvent 
> on a table drop. And we start to use a log warning instead of an assert.
> But this is not the best solution. We still need to assert that handler is 
> not null on first event for the partition. And we need to skip events if the 
> partition was removed. So we need:
>  # to add assert that `handler != null`,
>  # on StripedDisruptor.StripeEntryHandler#unsubscribe put a no-op handler to 
> a subscribers map instead of remove it,
>  # to remove the no-op handler when there are no events for this handler.
> h3. Definition of done:
>  # assert that `handler != null` is added,
>  # no-op handler on StripedDisruptor.StripeEntryHandler#unsubscribe,
>  # remove handler when it is not needed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20536) No-op handlers for StripedDisruptor.StripeEntryHandler#subscribers

2023-10-06 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20536:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> No-op handlers for StripedDisruptor.StripeEntryHandler#subscribers
> --
>
> Key: IGNITE-20536
> URL: https://issues.apache.org/jira/browse/IGNITE-20536
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> In https://issues.apache.org/jira/browse/IGNITE-20397 we discussed that it is 
> possible to get null handler in StripedDisruptor.StripeEntryHandler#onEvent 
> on a table drop. And we start to use a log warning instead of an assert.
> But this is not the best solution. We still need to assert that handler is 
> not null on first event for the partition. And we need to skip events if the 
> partition was removed. So we need:
>  # to add assert that `handler != null`,
>  # on StripedDisruptor.StripeEntryHandler#unsubscribe put a no-op handler to 
> a subscribers map instead of remove it,
>  # to remove the no-op handler when there are no events for this handler.
> h3. Definition of done:
>  # assert that `handler != null` is added,
>  # no-op handler on StripedDisruptor.StripeEntryHandler#unsubscribe,
>  # remove handler when it is not needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20397) java.lang.AssertionError: Group of the event is unsupported

2023-10-04 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20397:
--

Assignee: Sergey Uttsel

> java.lang.AssertionError: Group of the event is unsupported
> ---
>
> Key: IGNITE-20397
> URL: https://issues.apache.org/jira/browse/IGNITE-20397
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> {code:java}
>   java.lang.AssertionError: Group of the event is unsupported 
> [nodeId=<11_part_18/isaat_n_2>, 
> event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
> ~[disruptor-3.3.7.jar:?]
> at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]
> The root cause:
>  # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
> StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
>  # In some cases the `subscribers` map is cleared by invocation of 
> StripedDisruptor.StripeEntryHandler#unsubscribe (for example on table 
> dropping), and then StripeEntryHandler receives event with 
> SafeTimeSyncCommandImpl.
>  # It produces an assertion error: `assert handler != null`
> The issue is not caused by the catalog feature changes.
> The issue is reproduced when I run the 
> ItSqlAsynchronousApiTest#batchIncomplete with RepeatedTest annotation. In 
> this case the cluster is not restarted after each tests. It possible to 
> reproduced it frequently if add Thread.sleep in StripeEntryHandler#onEvent.
> h3. Implementation notes
> We decided that we can use LOG.warn() instead of an assert because it is 
> safely to skip this event if the table was dropped.
> {code:java}
> if (handler != null) {
> handler.onEvent(event, sequence, endOfBatch || subscribers.size() > 1 && 
> !supportsBatches);
> } else {
> LOG.warn(format("Group of the event is unsupported [nodeId={}, 
> event={}]", event.nodeId(), event));
> } {code}
> It is temp solution and we need to add TODO with link 
> https://issues.apache.org/jira/browse/IGNITE-20536
> *Definition of done*
> There is no asserts if handler is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20397) java.lang.AssertionError: Group of the event is unsupported

2023-10-04 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20397:
---
Description: 
h3. Motivation
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe (for example on table 
dropping), and then StripeEntryHandler receives event with 
SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

The issue is not caused by the catalog feature changes.

The issue is reproduced when I run the ItSqlAsynchronousApiTest#batchIncomplete 
with RepeatedTest annotation. In this case the cluster is not restarted after 
each tests. It possible to reproduced it frequently if add Thread.sleep in 
StripeEntryHandler#onEvent.
h3. Implementation notes

We decided that we can use LOG.warn() instead of an assert because it is safely 
to skip this event if the table was dropped.
{code:java}
if (handler != null) {
handler.onEvent(event, sequence, endOfBatch || subscribers.size() > 1 && 
!supportsBatches);
} else {
LOG.warn(format("Group of the event is unsupported [nodeId={}, event={}]", 
event.nodeId(), event));
} {code}
It is temp solution and we need to add TODO with link 
https://issues.apache.org/jira/browse/IGNITE-20536

*Definition of done*

There is no asserts if handler is null.

  was:
h3. Motivation
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe (for example on table 
dropping), and then StripeEntryHandler receives event with 
SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

The issue is not caused by the catalog feature changes.

The issue is reproduced when I run the ItSqlAsynchronousApiTest#batchIncomplete 
with RepeatedTest annotation. In this case the cluster is not restarted after 
each tests. It possible to reproduced it frequently if add Thread.sleep in 
StripeEntryHandler#onEvent.
h3. Implementation notes

We decided that we can use LOG.warn() instead of an assert because it is safely 
to skip this event if the table was dropped.
{code:java}
if (handler != null) {
handler.onEvent(event, sequence, endOfBatch || subscribers.size() > 1 && 
!supportsBatches);
} else {
LOG.warn(format("Group of the event is unsupported [nodeId={}, event={}]", 
event.nodeId(), event));
} {code}
*Definition of done*

There is no asserts if handler is null.


> java.lang.AssertionError: Group of the event is unsupported
> ---
>
> Key: IGNITE-20397
> URL: https://issues.apache.org/jira/browse/IGNITE-20397
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> {code:java}
>   java.lang.AssertionError: Group of the event is unsupported 
> 

[jira] [Created] (IGNITE-20536) No-op handlers for StripedDisruptor.StripeEntryHandler#subscribers

2023-10-02 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20536:
--

 Summary: No-op handlers for 
StripedDisruptor.StripeEntryHandler#subscribers
 Key: IGNITE-20536
 URL: https://issues.apache.org/jira/browse/IGNITE-20536
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Uttsel


h3. Motivation

In https://issues.apache.org/jira/browse/IGNITE-20397 we discussed that it is 
possible to get null handler in StripedDisruptor.StripeEntryHandler#onEvent on 
a table drop. And we start to use a log warning instead of an assert.

But this is not the best solution. We still need to assert that handler is not 
null on first event for the partition. And we need to skip events if the 
partition was removed. So we need:
 # to add assert that `handler != null`,
 # on StripedDisruptor.StripeEntryHandler#unsubscribe put a no-op handler to a 
subscribers map instead of remove it,
 # to remove the no-op handler when there are no events for this handler.

h3. Definition of done:
 # assert that `handler != null` is added,
 # no-op handler on StripedDisruptor.StripeEntryHandler#unsubscribe,
 # remove handler when it is not needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20448) Implement strategies for failure handling

2023-10-02 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20448:
---
Reviewer: Vyacheslav Koptilin

> Implement strategies for failure handling
> -
>
> Key: IGNITE-20448
> URL: https://issues.apache.org/jira/browse/IGNITE-20448
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vyacheslav Koptilin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Need to implement the following strategies for failure handling:
>  - StopNodeFailureHandler This handler should stop the node in case of a 
> critical error
>  - StopNodeOrHaltFailureHandler This handler should try to stop the node. If 
> the node cannot be stopped during a timeout, then the JVM process should be 
> stopped forcibly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20448) Implement strategies for failure handling

2023-10-02 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20448:
--

Assignee: Sergey Uttsel  (was: Vyacheslav Koptilin)

> Implement strategies for failure handling
> -
>
> Key: IGNITE-20448
> URL: https://issues.apache.org/jira/browse/IGNITE-20448
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vyacheslav Koptilin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Need to implement the following strategies for failure handling:
>  - StopNodeFailureHandler This handler should stop the node in case of a 
> critical error
>  - StopNodeOrHaltFailureHandler This handler should try to stop the node. If 
> the node cannot be stopped during a timeout, then the JVM process should be 
> stopped forcibly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20448) Implement strategies for failure handling

2023-10-02 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20448:
--

Assignee: Vyacheslav Koptilin  (was: Sergey Uttsel)

> Implement strategies for failure handling
> -
>
> Key: IGNITE-20448
> URL: https://issues.apache.org/jira/browse/IGNITE-20448
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vyacheslav Koptilin
>Assignee: Vyacheslav Koptilin
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Need to implement the following strategies for failure handling:
>  - StopNodeFailureHandler This handler should stop the node in case of a 
> critical error
>  - StopNodeOrHaltFailureHandler This handler should try to stop the node. If 
> the node cannot be stopped during a timeout, then the JVM process should be 
> stopped forcibly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20397) java.lang.AssertionError: Group of the event is unsupported

2023-09-29 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20397:
---
Description: 
h3. Motivation
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe (for example on table 
dropping), and then StripeEntryHandler receives event with 
SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

The issue is not caused by the catalog feature changes.

The issue is reproduced when I run the ItSqlAsynchronousApiTest#batchIncomplete 
with RepeatedTest annotation. In this case the cluster is not restarted after 
each tests. It possible to reproduced it frequently if add Thread.sleep in 
StripeEntryHandler#onEvent.
h3. Implementation notes

We decided that we can use LOG.warn() instead of an assert because it is safely 
to skip this event if the table was dropped.
{code:java}
if (handler != null) {
handler.onEvent(event, sequence, endOfBatch || subscribers.size() > 1 && 
!supportsBatches);
} else {
LOG.warn(format("Group of the event is unsupported [nodeId={}, event={}]", 
event.nodeId(), event));
} {code}
*Definition of done*

There is no asserts if handler is null.

  was:
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe (for example on table 
dropping), and then StripeEntryHandler receives event with 
SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

The issue is not caused by the catalog feature changes.

The issue is reproduced when I run the ItSqlAsynchronousApiTest#batchIncomplete 
with RepeatedTest annotation. In this case the cluster is not restarted after 
each tests. It possible to reproduced it frequently if add Thread.sleep in 
StripeEntryHandler#onEvent.

We decided that we can use LOG.warn() instead of an assert:
{code:java}
if (handler != null) {
handler.onEvent(event, sequence, endOfBatch || subscribers.size() > 1 && 
!supportsBatches);
} else {
LOG.warn(format("Group of the event is unsupported [nodeId={}, event={}]", 
event.nodeId(), event));
} {code}


> java.lang.AssertionError: Group of the event is unsupported
> ---
>
> Key: IGNITE-20397
> URL: https://issues.apache.org/jira/browse/IGNITE-20397
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> {code:java}
>   java.lang.AssertionError: Group of the event is unsupported 
> [nodeId=<11_part_18/isaat_n_2>, 
> event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> 

[jira] [Assigned] (IGNITE-20448) Implement strategies for failure handling

2023-09-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20448:
--

Assignee: Sergey Uttsel

> Implement strategies for failure handling
> -
>
> Key: IGNITE-20448
> URL: https://issues.apache.org/jira/browse/IGNITE-20448
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vyacheslav Koptilin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> Need to implement the following strategies for failure handling:
>  - StopNodeFailureHandler This handler should stop the node in case of a 
> critical error
>  - StopNodeOrHaltFailureHandler This handler should try to stop the node. If 
> the node cannot be stopped during a timeout, then the JVM process should be 
> stopped forcibly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20447) Introduce a new failure handling component

2023-09-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20447:
---
Reviewer: Mirza Aliev

> Introduce a new failure handling component
> --
>
> Key: IGNITE-20447
> URL: https://issues.apache.org/jira/browse/IGNITE-20447
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vyacheslav Koptilin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Let's add a new component `failure` to Apache Ignite 3 and add base 
> interfaces to this component.
> *Definition of done:*
>  - introduced a new module to Ignite 3 codebase
>  - introduced a new Ignite component - _FailureProcessor _with minimal no-op 
> implementation. This component is responsible for processing critical errors.
>  - introduced a new _FailureHandler _interface. An implementation of this 
> interface represents a concrete strategy for handling errors.
>  - introduced a new enum _FailureType _that describes a possible type of 
> failure. The following types can be considered as a starting point: 
> _CRITICAL_ERROR_, _SYSTEM_WORKER_TERMINATION_, _SYSTEM_WORKER_BLOCKED_, 
> _SYSTEM_CRITICAL_OPERATION_TIMEOUT_
>  - introduced a new class _FailureContext _that contains information about 
> failure type and exception.
> *Implemenattion notes:*
> All these classes and interfaces should be a part of internal API due to 
> the end user should not provide a custom implementation of the failure 
> handler, Apache Ignite should provide a closed list of handlers out of the 
> box.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20397) java.lang.AssertionError: Group of the event is unsupported

2023-09-25 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20397:
---
Description: 
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe, and then StripeEntryHandler 
receives event with SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

The issue is not caused by the catalog feature changes.

It possible to reproduced it if add Thread.sleep in StripeEntryHandler#onEvent.

UPD:

The issue is reproduced when I run the ItSqlAsynchronousApiTest#batchIncomplete 
with RepeatedTest annotation. In this case the cluster is not restarted after 
each tests.

When I change test class to "start cluster, create table, drop table, stop 
cluster" then the issue is not reproduced.

We decided that we can use LOG.warn() instead of an assert:
 * If handler == null then print assert
 * else do handler.onEvent

  was:
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe, and then StripeEntryHandler 
receives event with SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

The issue is not caused by the catalog feature changes.

It possible to reproduced it if add Thread.sleep in StripeEntryHandler#onEvent.

Originally it was reproduced on a table dropping. But it possible to reproduce 
it on a table creation if set "IDLE_SAFE_TIME_PROPAGATION_PERIOD_MILLISECONDS = 
500;".


> java.lang.AssertionError: Group of the event is unsupported
> ---
>
> Key: IGNITE-20397
> URL: https://issues.apache.org/jira/browse/IGNITE-20397
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> {code:java}
>   java.lang.AssertionError: Group of the event is unsupported 
> [nodeId=<11_part_18/isaat_n_2>, 
> event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
> ~[disruptor-3.3.7.jar:?]
> at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]
> The root cause:
>  # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
> StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
>  # In some cases the `subscribers` map is cleared by 

[jira] [Updated] (IGNITE-20397) java.lang.AssertionError: Group of the event is unsupported

2023-09-22 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20397:
---
Description: 
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe, and then StripeEntryHandler 
receives event with SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

The issue is not caused by the catalog feature changes.

It possible to reproduced it if add Thread.sleep in StripeEntryHandler#onEvent.

Originally it was reproduced on a table dropping. But it possible to reproduce 
it on a table creation if set "IDLE_SAFE_TIME_PROPAGATION_PERIOD_MILLISECONDS = 
500;".

  was:
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe, and then StripeEntryHandler 
receives event with SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`


> java.lang.AssertionError: Group of the event is unsupported
> ---
>
> Key: IGNITE-20397
> URL: https://issues.apache.org/jira/browse/IGNITE-20397
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> {code:java}
>   java.lang.AssertionError: Group of the event is unsupported 
> [nodeId=<11_part_18/isaat_n_2>, 
> event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
> ~[disruptor-3.3.7.jar:?]
> at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]
> The root cause:
>  # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
> StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
>  # In some cases the `subscribers` map is cleared by invocation of 
> StripedDisruptor.StripeEntryHandler#unsubscribe, and then StripeEntryHandler 
> receives event with SafeTimeSyncCommandImpl.
>  # It produces an assertion error: `assert handler != null`
> The issue is not caused by the catalog feature changes.
> It possible to reproduced it if add Thread.sleep in 
> StripeEntryHandler#onEvent.
> Originally it was reproduced on a table dropping. But it possible to 
> reproduce it on a table creation if set 
> "IDLE_SAFE_TIME_PROPAGATION_PERIOD_MILLISECONDS = 500;".



--
This message was sent by Atlassian Jira

[jira] [Updated] (IGNITE-20412) Fix ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart

2023-09-22 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20412:
---
Description: 
h3. Motivation

org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
 started to fall in the catalog-feature branch and fails in the main branch 
after catalog-feature is merged

[https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=]
{code:java}
java.lang.AssertionError:
Expected: is <[]>
 but: was <[A]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
at 
org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
{code}
h3. Implementation notes

The root cause:
 # This test changes metaStorageManager behavior and it throws expected 
exception on ms.invoke.
 # The test alters zone with new filter.
 # DistributionZoneManager#onUpdateFilter return a future from 
saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
 # The future is completed exceptionally and WatchProcessor#notificationFuture 
will be completed exceptionally.
 # Next updates will not be handled properly because notificationFuture is 
completed exceptionally.

We have already created tickets obout exception handling:
 * https://issues.apache.org/jira/browse/IGNITE-14693
 * https://issues.apache.org/jira/browse/IGNITE-14611

 

The test scenario is incorrect because the node should be stopped (by failure 
handler) if the ms.invoke failed. We need to rewrite it when the DZM restart 
will be updated.

  was:
h3. Motivation

org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
 started to fall in the catalog-feature branch and fails in the main branch 
after catalog-feature is merged

https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=

{code:java}
java.lang.AssertionError:
Expected: is <[]>
 but: was <[A]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
at 
org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
{code}

h3. Implementation notes

The root cause:
# This test changes metaStorageManager behavior and it throws expected 
exception on ms.invoke.
# The test alters zone with new filter.
# DistributionZoneManager#onUpdateFilter return a future from 
saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
# The future is completed exceptionally and WatchProcessor#notificationFuture 
will be completed exceptionally.
# Next updates will not be handled properly because notificationFuture is 
completed exceptionally.

We have already created tickets obout exception handling:
* https://issues.apache.org/jira/browse/IGNITE-14693 
* https://issues.apache.org/jira/browse/IGNITE-14611 

I think the test scenario is incorrect because the node should be stopped (by 
failure handler) if the ms.invoke failed.


> Fix 
> ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
> 
>
> Key: IGNITE-20412
> URL: https://issues.apache.org/jira/browse/IGNITE-20412
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> h3. Motivation
> org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
>  started to fall in the catalog-feature branch and fails in the main branch 
> after catalog-feature is merged
> [https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=]
> {code:java}
> java.lang.AssertionError:
> Expected: is <[]>
>  but: was <[A]>
> at 

[jira] [Assigned] (IGNITE-20447) Introduce a new failure handling component

2023-09-22 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20447:
--

Assignee: Sergey Uttsel

> Introduce a new failure handling component
> --
>
> Key: IGNITE-20447
> URL: https://issues.apache.org/jira/browse/IGNITE-20447
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vyacheslav Koptilin
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> Let's add a new component `failure` to Apache Ignite 3 and add base 
> interfaces to this component.
> *Definition of done:*
>  - introduced a new module to Ignite 3 codebase
>  - introduced a new Ignite component - _FailureProcessor _with minimal no-op 
> implementation. This component is responsible for processing critical errors.
>  - introduced a new _FailureHandler _interface. An implementation of this 
> interface represents a concrete strategy for handling errors.
>  - introduced a new enum _FailureType _that describes a possible type of 
> failure. The following types can be considered as a starting point: 
> _CRITICAL_ERROR_, _SYSTEM_WORKER_TERMINATION_, _SYSTEM_WORKER_BLOCKED_, 
> _SYSTEM_CRITICAL_OPERATION_TIMEOUT_
>  - introduced a new class _FailureContext _that contains information about 
> failure type and exception.
> *Implemenattion notes:*
> All these classes and interfaces should be a part of internal API due to 
> the end user should not provide a custom implementation of the failure 
> handler, Apache Ignite should provide a closed list of handlers out of the 
> box.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20397) java.lang.AssertionError: Group of the event is unsupported

2023-09-19 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20397:
---
Description: 
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]

The root cause:
 # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
 # In some cases the `subscribers` map is cleared by invocation of 
StripedDisruptor.StripeEntryHandler#unsubscribe, and then StripeEntryHandler 
receives event with SafeTimeSyncCommandImpl.
 # It produces an assertion error: `assert handler != null`

  was:
{code:java}
  java.lang.AssertionError: Group of the event is unsupported 
[nodeId=<11_part_18/isaat_n_2>, 
event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
 ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
~[disruptor-3.3.7.jar:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true


> java.lang.AssertionError: Group of the event is unsupported
> ---
>
> Key: IGNITE-20397
> URL: https://issues.apache.org/jira/browse/IGNITE-20397
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> {code:java}
>   java.lang.AssertionError: Group of the event is unsupported 
> [nodeId=<11_part_18/isaat_n_2>, 
> event=org.apache.ignite.raft.jraft.core.NodeImpl$LogEntryAndClosure@653d84a]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:224)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:191)
>  ~[ignite-raft-3.0.0-SNAPSHOT.jar:?]
> at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) 
> ~[disruptor-3.3.7.jar:?]
> at java.lang.Thread.run(Thread.java:834) ~[?:?] {code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7498320?expandCode+Inspection=true=true=false=true=false=true]
> The root cause:
>  # StripedDisruptor.StripeEntryHandler#onEvent method gets handler from 
> StripedDisruptor.StripeEntryHandler#subscribers by event.nodeId().
>  # In some cases the `subscribers` map is cleared by invocation of 
> StripedDisruptor.StripeEntryHandler#unsubscribe, and then StripeEntryHandler 
> receives event with SafeTimeSyncCommandImpl.
>  # It produces an assertion error: `assert handler != null`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20412) Fix ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart

2023-09-18 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20412:
---
Description: 
h3. Motivation

org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
 started to fall in the catalog-feature branch and fails in the main branch 
after catalog-feature is merged

https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=

{code:java}
java.lang.AssertionError:
Expected: is <[]>
 but: was <[A]>
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at 
org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
at 
org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
{code}

h3. Implementation notes

The root cause:
# This test changes metaStorageManager behavior and it throws expected 
exception on ms.invoke.
# The test alters zone with new filter.
# DistributionZoneManager#onUpdateFilter return a future from 
saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
# The future is completed exceptionally and WatchProcessor#notificationFuture 
will be completed exceptionally.
# Next updates will not be handled properly because notificationFuture is 
completed exceptionally.

We have already created tickets obout exception handling:
* https://issues.apache.org/jira/browse/IGNITE-14693 
* https://issues.apache.org/jira/browse/IGNITE-14611 

I think the test scenario is incorrect because the node should be stopped (by 
failure handler) if the ms.invoke failed.

  was:
*org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart*
 started to fall in the 
[catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
branch, and on other branches that are created from it branch, need to fix it.

https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=


> Fix 
> ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
> 
>
> Key: IGNITE-20412
> URL: https://issues.apache.org/jira/browse/IGNITE-20412
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> h3. Motivation
> org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
>  started to fall in the catalog-feature branch and fails in the main branch 
> after catalog-feature is merged
> https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=
> {code:java}
> java.lang.AssertionError:
> Expected: is <[]>
>  but: was <[A]>
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
> at 
> org.apache.ignite.internal.distributionzones.DistributionZonesTestUtil.assertValueInStorage(DistributionZonesTestUtil.java:459)
> at 
> org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest.testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart(ItIgniteDistributionZoneManagerNodeRestartTest.java:539)
> {code}
> h3. Implementation notes
> The root cause:
> # This test changes metaStorageManager behavior and it throws expected 
> exception on ms.invoke.
> # The test alters zone with new filter.
> # DistributionZoneManager#onUpdateFilter return a future from 
> saveDataNodesToMetaStorageOnScaleUp(zoneId, causalityToken)
> # The future is completed exceptionally and WatchProcessor#notificationFuture 
> will be completed exceptionally.
> # Next updates will not be handled properly because notificationFuture is 
> completed exceptionally.
> We have already created tickets obout exception handling:
> * https://issues.apache.org/jira/browse/IGNITE-14693 
> * https://issues.apache.org/jira/browse/IGNITE-14611 
> I think the test scenario is incorrect because the node should be stopped (by 
> failure handler) if the ms.invoke failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20412) Fix ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart

2023-09-15 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20412:
---
Summary: Fix 
ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
  (was: Fix 
ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart
 again)

> Fix 
> ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
> 
>
> Key: IGNITE-20412
> URL: https://issues.apache.org/jira/browse/IGNITE-20412
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> *org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart*
>  started to fall in the 
> [catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
> branch, and on other branches that are created from it branch, need to fix it.
> https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20412) Fix ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart

2023-09-15 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20412:
---
Description: 
*org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart*
 started to fall in the 
[catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
branch, and on other branches that are created from it branch, need to fix it.

https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=

  was:
*org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart*
 started to fall in the 
[catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
branch, and on other branches that are created from it branch, need to fix it.

https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=


> Fix 
> ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart
> 
>
> Key: IGNITE-20412
> URL: https://issues.apache.org/jira/browse/IGNITE-20412
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> *org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpsTriggeredByFilterUpdateAndNodeJoinAreRestoredAfterRestart*
>  started to fall in the 
> [catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
> branch, and on other branches that are created from it branch, need to fix it.
> https://ci.ignite.apache.org/viewLog.html?buildId=7501721=buildResultsDiv=ApacheIgnite3xGradle_Test_RunAllTests=



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20332) Fix ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart

2023-09-12 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764063#comment-17764063
 ] 

Sergey Uttsel commented on IGNITE-20332:


I fixed it in the catalog-future branch. Also I fixed flakiness in the main 
branch.

> Fix 
> ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart
> ---
>
> Key: IGNITE-20332
> URL: https://issues.apache.org/jira/browse/IGNITE-20332
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> *org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart*
>  started to fall in the 
> [catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
> branch, and on other branches that are created from it branch, need to fix it.
> https://ci.ignite.apache.org/viewLog.html?buildId=7470189=ApacheIgnite3xGradle_Test_RunAllTests=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20332) Fix ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart

2023-09-06 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20332:
---
Reviewer: Mirza Aliev

> Fix 
> ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart
> ---
>
> Key: IGNITE-20332
> URL: https://issues.apache.org/jira/browse/IGNITE-20332
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart*
>  started to fall in the 
> [catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
> branch, and on other branches that are created from it branch, need to fix it.
> https://ci.ignite.apache.org/viewLog.html?buildId=7470189=ApacheIgnite3xGradle_Test_RunAllTests=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20332) Fix ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart

2023-09-06 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20332:
--

Assignee: Sergey Uttsel

> Fix 
> ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart
> ---
>
> Key: IGNITE-20332
> URL: https://issues.apache.org/jira/browse/IGNITE-20332
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *org.apache.ignite.internal.distribution.zones.ItIgniteDistributionZoneManagerNodeRestartTest#testScaleUpTriggeredByFilterUpdateIsRestoredAfterRestart*
>  started to fall in the 
> [catalog-feature|https://github.com/apache/ignite-3/tree/catalog-feature] 
> branch, and on other branches that are created from it branch, need to fix it.
> https://ci.ignite.apache.org/viewLog.html?buildId=7470189=ApacheIgnite3xGradle_Test_RunAllTests=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-04 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
* ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.

* We cannnot return future from LogicalTopologyEventListener's methods. We can 
ignore these futures. It has drawback: we can skip the topology update
# topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
# Node C was joined to the topology and left quickly and ms invokes to update 
topology entry was reordered.
# data nodes was not updated immediately to [A,B,C].
We think that we can ignore this bug because eventually it doesn't break the 
consistency of the date node. For this purpose we need to change the invoke 
condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`

* Need to return futures from WatchListener#onUpdate method of the data nodes 
listener.

  was:
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.

We cannnot return future from LogicalTopologyEventListener's methods. We can 
ignore these futures. It has drawback: we can skip the topology update
# topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
# Node C was joined to the topology and left quickly and ms invokes to update 
topology entry was reordered.
# data nodes was not updated immediately to [A,B,C].
We think that we can ignore this bug because eventually it doesn't break the 
consistency of the date node. For this purpose we need to change the invoke 
condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`

Need to return futures from WatchListener#onUpdate method of the data nodes 
listener.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so 

[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-04 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
* ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.

* We cannnot return future from LogicalTopologyEventListener's methods. We can 
ignore these futures. It has drawback: we can skip the topology update
# topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
# Node C was joined to the topology and left quickly and ms invokes to update 
topology entry was reordered.
# data nodes was not updated immediately to [A,B,C].
We think that we can ignore this bug because eventually it doesn't break the 
consistency of the date node. For this purpose we need to change the invoke 
condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`

* Need to return ms invoke futures from WatchListener#onUpdate method of the 
data nodes listener.

  was:
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
* ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.

* We cannnot return future from LogicalTopologyEventListener's methods. We can 
ignore these futures. It has drawback: we can skip the topology update
# topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
# Node C was joined to the topology and left quickly and ms invokes to update 
topology entry was reordered.
# data nodes was not updated immediately to [A,B,C].
We think that we can ignore this bug because eventually it doesn't break the 
consistency of the date node. For this purpose we need to change the invoke 
condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`

* Need to return futures from WatchListener#onUpdate method of the data nodes 
listener.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are 

[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-04 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.

We cannnot return future from LogicalTopologyEventListener's methods. We can 
ignore these futures. It has drawback: we can skip the topology update
# topology=[A,B], dataNodes=[A,B], scaleUp=0, scaleDown=100
# Node C was joined to the topology and left quickly and ms invokes to update 
topology entry was reordered.
# data nodes was not updated immediately to [A,B,C].
We think that we can ignore this bug because eventually it doesn't break the 
consistency of the date node. For this purpose we need to change the invoke 
condition:
`value(zonesLogicalTopologyVersionKey()).lt(longToBytes(newTopology.version()))`
 instead of
`value(zonesLogicalTopologyVersionKey()).eq(longToBytes(newTopology.version() - 
1))`

Need to return futures from WatchListener#onUpdate method of the data nodes 
listener.

  was:
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.
# Need to return futures from WatchListener#onUpdate method of the data nodes 
listener.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
> replicas update.
> # 

[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
watch listener to update pending assignments.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.
# Need to return futures from WatchListener#onUpdate method of the data nodes 
listener.

  was:
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
> replicas update.
> # LogicalTopologyEventListener to update logical topology.
> # DistributionZoneRebalanceEngine#createDistributionZonesDataNodesListener 
> watch listener to update pending assignments.
> h3. *Definition of Done*
> Need to ensure event handling linearization.
> h3. *Implementation Notes*
> # ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
> DistributionZoneManager#onUpdateFilter and 
> DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> # We cannnot return future from LogicalTopologyEventListener's methods. So we 
> can chain their ms invokes 

[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
replicas update.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
DistributionZoneManager#onUpdateFilter and 
DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
listeners. So we can  just return the ms invoke future  from these methods and 
it ensure, that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.

  was:
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete and 
DistributionZoneManager#onUpdateFilter are invoked in configuration listeners. 
So we can  just return the ms invoke future  from these methods and it ensure, 
that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # DistributionZoneRebalanceEngine#onUpdateReplicas to apdate assignment on 
> replicas update.
> # LogicalTopologyEventListener to update logical topology.
> h3. *Definition of Done*
> Need to ensure event handling linearization.
> h3. *Implementation Notes*
> # ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete, 
> DistributionZoneManager#onUpdateFilter and 
> DistributionZoneRebalanceEngine#onUpdateReplicas are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> # We cannnot return future from LogicalTopologyEventListener's methods. So we 
> can chain their ms invokes futures in DZM or we can add tasks with ms invoke 
> to executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to ensure event handling linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete and 
DistributionZoneManager#onUpdateFilter are invoked in configuration listeners. 
So we can  just return the ms invoke future  from these methods and it ensure, 
that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.

  was:
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to return meta storage futures from event handlers to ensure event 
linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete and 
DistributionZoneManager#onUpdateFilter are invoked in configuration listeners. 
So we can  just return the ms invoke future  from these methods and it ensure, 
that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # LogicalTopologyEventListener to update logical topology.
> h3. *Definition of Done*
> Need to ensure event handling linearization.
> h3. *Implementation Notes*
> # ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete 
> and DistributionZoneManager#onUpdateFilter are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> # We cannnot return future from LogicalTopologyEventListener's methods. So we 
> can chain their ms invokes futures in DZM or we can add tasks with ms invoke 
> to executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to return meta storage futures from event handlers to ensure event 
linearization.

h3. *Implementation Notes*
# ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete and 
DistributionZoneManager#onUpdateFilter are invoked in configuration listeners. 
So we can  just return the ms invoke future  from these methods and it ensure, 
that this invoke will be completed within the current event handling.
# We cannnot return future from LogicalTopologyEventListener's methods. So we 
can chain their ms invokes futures in DZM or we can add tasks with ms invoke to 
executor.

  was:
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to return meta storage futures from event handlers to ensure event 
linearization.

h3. *Implementation Notes*
ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete and 
DistributionZoneManager#onUpdateFilter are invoked in configuration listeners. 
So we can  just return the ms invoke future  from these methods and it ensure, 
that this invoke will be completed within the current event handling.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # LogicalTopologyEventListener to update logical topology.
> h3. *Definition of Done*
> Need to return meta storage futures from event handlers to ensure event 
> linearization.
> h3. *Implementation Notes*
> # ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete 
> and DistributionZoneManager#onUpdateFilter are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.
> # We cannnot return future from LogicalTopologyEventListener's methods. So we 
> can chain their ms invokes futures in DZM or we can add tasks with ms invoke 
> to executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
h3. *Motivation*
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Therefore several invokes 
for example on createZone and alterZone can be reordered. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# LogicalTopologyEventListener to update logical topology.

h3. *Definition of Done*
Need to return meta storage futures from event handlers to ensure event 
linearization.

h3. *Implementation Notes*
ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete and 
DistributionZoneManager#onUpdateFilter are invoked in configuration listeners. 
So we can  just return the ms invoke future  from these methods and it ensure, 
that this invoke will be completed within the current event handling.

  was:
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.

Need to return meta storage futures from event handlers to ensure event 
linearization.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Therefore 
> several invokes for example on createZone and alterZone can be reordered. 
> Currently it does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> # LogicalTopologyEventListener to update logical topology.
> h3. *Definition of Done*
> Need to return meta storage futures from event handlers to ensure event 
> linearization.
> h3. *Implementation Notes*
> ZonesConfigurationListener#onCreate, ZonesConfigurationListener#onDelete and 
> DistributionZoneManager#onUpdateFilter are invoked in configuration 
> listeners. So we can  just return the ms invoke future  from these methods 
> and it ensure, that this invoke will be completed within the current event 
> handling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-09-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20317:
---
Description: 
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.

Need to return meta storage futures from event handlers to ensure event 
linearization.

  was:
There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# Also saveDataNodesToMetaStorageOnScaleUp and 
saveDataNodesToMetaStorageOnScaleDown do invokes.

Need to return meta storage futures from event handlers to ensure event 
linearization.


> Meta storage invokes are not completed when events are handled in DZM 
> --
>
> Key: IGNITE-20317
> URL: https://issues.apache.org/jira/browse/IGNITE-20317
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> There are meta storage invokes in DistributionZoneManager in zone's 
> lifecycle. The futures of these invokes are ignored, so after the lifecycle 
> method is completed actually not all its actions are completed. Currently it 
> does the meta storage invokes in:
> # ZonesConfigurationListener#onCreate to init a zone.
> # ZonesConfigurationListener#onDelete to clean up the zone data.
> # LogicalTopologyEventListener to update logical topology.
> # DistributionZoneManager#onUpdateFilter to save data nodes in the meta 
> storage.
> Need to return meta storage futures from event handlers to ensure event 
> linearization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-20326) Meta storage invokes are not completed when data nodes are recalculated in DZM

2023-09-01 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20326:
--

 Summary: Meta storage invokes are not completed when data nodes 
are recalculated in DZM
 Key: IGNITE-20326
 URL: https://issues.apache.org/jira/browse/IGNITE-20326
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Uttsel


There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Such invokes is used in:
# DistributionZoneManager#saveDataNodesToMetaStorageOnScaleUp
# DistributionZoneManager#saveDataNodesToMetaStorageOnScaleDown
to recalculate data nodes when timers are fired.

Need to check do we need to await futures from these invokes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20303) "Raft group on the node is already started" exception when pending and planned assignment changed faster then rebalance

2023-08-31 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20303:
---
Reviewer: Kirill Gusakov

> "Raft group on the node is already started" exception when pending and 
> planned assignment changed faster then rebalance
> ---
>
> Key: IGNITE-20303
> URL: https://issues.apache.org/jira/browse/IGNITE-20303
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If many changes of assignment are happened quickly then rebalance does not 
> have time to be completed for each change. In this case exception is thrown:
> {code:java}
> 2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor]
>  Error occurred when processing a watch event
>  org.apache.ignite.lang.IgniteInternalException: Raft group on the node is 
> already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer 
> [consistentId=irdt_ttqr_2, idx=0]]]
>   at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) 
> ~[main/:?]
>   at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) 
> ~[main/:?]
>   at 
> org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) 
> ~[main/:?]
>   at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259)
>  ~[main/:?]
>   at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>   at java.lang.Thread.run(Thread.java:834) ~[?:?]
> {code}
> The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances. 
> See exception in the test log:
> {code:java}
> @Test
> void testThreeQueuedRebalances() throws Exception {
> Node node = getNode(0);
> createZone(node, ZONE_NAME, 1, 1);
> createTable(node, ZONE_NAME, TABLE_NAME);
> assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
> 0).size() == 1, AWAIT_TIMEOUT_MILLIS));
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> waitPartitionAssignmentsSyncedToExpected(0, 2);
> checkPartitionNodes(0, 2);
> }
> {code}
> We can fix it by a check if the raft node and the Replica are created before 
> startPartitionRaftGroupNode and startReplicaWithNewListener in 
> TableManager#handleChangePendingAssignmentEvent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-20317) Meta storage invokes are not completed when events are handled in DZM

2023-08-31 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20317:
--

 Summary: Meta storage invokes are not completed when events are 
handled in DZM 
 Key: IGNITE-20317
 URL: https://issues.apache.org/jira/browse/IGNITE-20317
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Uttsel


There are meta storage invokes in DistributionZoneManager in zone's lifecycle. 
The futures of these invokes are ignored, so after the lifecycle method is 
completed actually not all its actions are completed. Currently it does the 
meta storage invokes in:
# ZonesConfigurationListener#onCreate to init a zone.
# ZonesConfigurationListener#onDelete to clean up the zone data.
# LogicalTopologyEventListener to update logical topology.
# DistributionZoneManager#onUpdateFilter to save data nodes in the meta storage.
# Also saveDataNodesToMetaStorageOnScaleUp and 
saveDataNodesToMetaStorageOnScaleDown do invokes.

Need to return meta storage futures from event handlers to ensure event 
linearization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20310) Meta storage invokes are not completed when DZM start is completed

2023-08-31 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20310:
---
Description: 
There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invokes in 
DistributionZoneManager#createOrRestoreZoneState:
# DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to init the 
default zone.
# DistributionZoneManager#restoreTimers. in case when a filter update was 
handled before DZM stop, but it didn't update data nodes.

Futures of these invokes are ignored. So after the start method is completed 
actually not all start actions are completed.


  was:
There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invokes in 
DistributionZoneManager#createOrRestoreZoneState:
# DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to update 
data nodes.
# DistributionZoneManager#restoreTimers. in case when a filter update was 
handled before DZM stop, but it didn't update data nodes.

Futures of these invokes are ignored. So after the start method is completed 
actually not all start actions are completed.



> Meta storage invokes are not completed  when DZM start is completed
> ---
>
> Key: IGNITE-20310
> URL: https://issues.apache.org/jira/browse/IGNITE-20310
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> There are meta storage invokes in DistributionZoneManager start. Currently it 
> does the meta storage invokes in 
> DistributionZoneManager#createOrRestoreZoneState:
> # DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to init 
> the default zone.
> # DistributionZoneManager#restoreTimers. in case when a filter update was 
> handled before DZM stop, but it didn't update data nodes.
> Futures of these invokes are ignored. So after the start method is completed 
> actually not all start actions are completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20310) Meta storage invokes are not completed when DZM start is completed

2023-08-31 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20310:
---
Description: 
There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invokes in 
DistributionZoneManager#createOrRestoreZoneState:
# DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to update 
data nodes.
# DistributionZoneManager#restoreTimers. in case when a filter update was 
handled before DZM stop, but it didn't update data nodes.

Futures of these invokes are ignored. So after the start method is completed 
actually not all start actions are completed.


  was:
There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invokes in 
DistributionZoneManager#createOrRestoreZoneState:
# DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to update 
data nodes.
# DistributionZoneManager#restoreTimers. in case when a filter update was 
handled before DZM stop, but it didn't update data nodes.
Futures of these invokes are ignored. So after the start method is completed 
actually not all start actions are completed.



> Meta storage invokes are not completed  when DZM start is completed
> ---
>
> Key: IGNITE-20310
> URL: https://issues.apache.org/jira/browse/IGNITE-20310
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> There are meta storage invokes in DistributionZoneManager start. Currently it 
> does the meta storage invokes in 
> DistributionZoneManager#createOrRestoreZoneState:
> # DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to update 
> data nodes.
> # DistributionZoneManager#restoreTimers. in case when a filter update was 
> handled before DZM stop, but it didn't update data nodes.
> Futures of these invokes are ignored. So after the start method is completed 
> actually not all start actions are completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20310) Meta storage invokes are not completed when DZM start is completed

2023-08-31 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20310:
---
Description: 
There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invokes in 
DistributionZoneManager#createOrRestoreZoneState:
# DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to update 
data nodes.
# DistributionZoneManager#restoreTimers. in case when a filter update was 
handled before DZM stop, but it didn't update data nodes.
Futures of these invokes are ignored. So after the start method is completed 
actually not all start actions are completed.


  was:
There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invoke in case when a filter update was handled before 
DZM stop, but it didn't update data nodes. The future of this invoke is 
ignored. So after the start method is completed actually not all start actions 
are completed.



> Meta storage invokes are not completed  when DZM start is completed
> ---
>
> Key: IGNITE-20310
> URL: https://issues.apache.org/jira/browse/IGNITE-20310
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> There are meta storage invokes in DistributionZoneManager start. Currently it 
> does the meta storage invokes in 
> DistributionZoneManager#createOrRestoreZoneState:
> # DistributionZoneManager#initDataNodesAndTriggerKeysInMetaStorage to update 
> data nodes.
> # DistributionZoneManager#restoreTimers. in case when a filter update was 
> handled before DZM stop, but it didn't update data nodes.
> Futures of these invokes are ignored. So after the start method is completed 
> actually not all start actions are completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-20303) "Raft group on the node is already started" exception when pending and planned assignment changed faster then rebalance

2023-08-30 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20303:
--

Assignee: Sergey Uttsel

> "Raft group on the node is already started" exception when pending and 
> planned assignment changed faster then rebalance
> ---
>
> Key: IGNITE-20303
> URL: https://issues.apache.org/jira/browse/IGNITE-20303
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> If many changes of assignment are happened quickly then rebalance does not 
> have time to be completed for each change. In this case exception is thrown:
> {code:java}
> 2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor]
>  Error occurred when processing a watch event
>  org.apache.ignite.lang.IgniteInternalException: Raft group on the node is 
> already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer 
> [consistentId=irdt_ttqr_2, idx=0]]]
>   at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) 
> ~[main/:?]
>   at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) 
> ~[main/:?]
>   at 
> org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261)
>  ~[main/:?]
>   at 
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) 
> ~[main/:?]
>   at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259)
>  ~[main/:?]
>   at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>   at java.lang.Thread.run(Thread.java:834) ~[?:?]
> {code}
> The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances. 
> See exception in the test log:
> {code:java}
> @Test
> void testThreeQueuedRebalances() throws Exception {
> Node node = getNode(0);
> createZone(node, ZONE_NAME, 1, 1);
> createTable(node, ZONE_NAME, TABLE_NAME);
> assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
> 0).size() == 1, AWAIT_TIMEOUT_MILLIS));
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> waitPartitionAssignmentsSyncedToExpected(0, 2);
> checkPartitionNodes(0, 2);
> }
> {code}
> We can fix it by a check if the raft node and the Replica are created before 
> startPartitionRaftGroupNode and startReplicaWithNewListener in 
> TableManager#handleChangePendingAssignmentEvent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-20310) Meta storage invokes are not completed when DZM start is completed

2023-08-30 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20310:
--

 Summary: Meta storage invokes are not completed  when DZM start is 
completed
 Key: IGNITE-20310
 URL: https://issues.apache.org/jira/browse/IGNITE-20310
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Uttsel


There are meta storage invokes in DistributionZoneManager start. Currently it 
does the meta storage invoke in case when a filter update was handled before 
DZM stop, but it didn't update data nodes. The future of this invoke is 
ignored. So after the start method is completed actually not all start actions 
are completed.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20303) "Raft group on the node is already started" exception when pending and planned assignment changed faster then rebalance

2023-08-29 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20303:
---
Description: 
If many changes of assignment are happened quickly then rebalance does not have 
time to be completed for each change. In this case exception is thrown:

{code:java}
2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor]
 Error occurred when processing a watch event
 org.apache.ignite.lang.IgniteInternalException: Raft group on the node is 
already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer 
[consistentId=irdt_ttqr_2, idx=0]]]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261)
 ~[main/:?]
at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259)
 ~[main/:?]
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
 ~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
{code}

The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances. 
See exception in the test log:

{code:java}
@Test
void testThreeQueuedRebalances() throws Exception {
Node node = getNode(0);

createZone(node, ZONE_NAME, 1, 1);

createTable(node, ZONE_NAME, TABLE_NAME);

assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
0).size() == 1, AWAIT_TIMEOUT_MILLIS));

alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);

waitPartitionAssignmentsSyncedToExpected(0, 2);

checkPartitionNodes(0, 2);
}
{code}

We can fix it by a check if the raft node and the Replica are created before 
startPartitionRaftGroupNode and startReplicaWithNewListener in 
TableManager#handleChangePendingAssignmentEvent.

  was:
If many changes of assignment are happened quickly then rebalance does not have 
time to be completed for each change. In this case exception is thrown:

{code:java}
2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor]
 Error occurred when processing a watch event
 org.apache.ignite.lang.IgniteInternalException: Raft group on the node is 
already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer 
[consistentId=irdt_ttqr_2, idx=0]]]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261)
 ~[main/:?]
at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259)
 ~[main/:?]
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
 ~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
{code}

The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances. 
See exception in the test log:

{code:java}
@Test
void testThreeQueuedRebalances() throws Exception {
Node node = getNode(0);

createZone(node, ZONE_NAME, 1, 1);

[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations

2023-08-29 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20279:
---
Description: 
The issue is shown in the test, where several zone change operations occur. In 
this case, the operation can be reordered and incomplete at the end of the 
test. There are messages "Received update for replicas number" in the test log 
in a wrong order. The reproducer based on 
ItRebalanceDistributedTest#testThreeQueuedRebalances:

{code:java}
@Test
void testThreeQueuedRebalances() throws Exception {
Node node = getNode(0);

createZone(node, ZONE_NAME, 1, 1);

createTable(node, ZONE_NAME, TABLE_NAME);

assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
0).size() == 1, AWAIT_TIMEOUT_MILLIS));

alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);

waitPartitionAssignmentsSyncedToExpected(0, 2);

checkPartitionNodes(0, 2);
}
{code}


  was:
The issue is shown in the test, where several zone change operations occur. In 
this case, the operation can be reordered and incomplete at the end of the 
test. There are messages "Received update for replicas number" in the test log 
in a wrong order. The reproducer based on 
ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the test 
log:

{code:java}
@Test
void testThreeQueuedRebalances() throws Exception {
Node node = getNode(0);

createZone(node, ZONE_NAME, 1, 1);

createTable(node, ZONE_NAME, TABLE_NAME);

assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
0).size() == 1, AWAIT_TIMEOUT_MILLIS));

alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);

waitPartitionAssignmentsSyncedToExpected(0, 2);

checkPartitionNodes(0, 2);
}
{code}



> Reordering of altering zone operations
> --
>
> Key: IGNITE-20279
> URL: https://issues.apache.org/jira/browse/IGNITE-20279
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue is shown in the test, where several zone change operations occur. 
> In this case, the operation can be reordered and incomplete at the end of the 
> test. There are messages "Received update for replicas number" in the test 
> log in a wrong order. The reproducer based on 
> ItRebalanceDistributedTest#testThreeQueuedRebalances:
> {code:java}
> @Test
> void testThreeQueuedRebalances() throws Exception {
> Node node = getNode(0);
> createZone(node, ZONE_NAME, 1, 1);
> createTable(node, ZONE_NAME, TABLE_NAME);
> assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
> 0).size() == 1, AWAIT_TIMEOUT_MILLIS));
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> waitPartitionAssignmentsSyncedToExpected(0, 2);
> checkPartitionNodes(0, 2);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations

2023-08-29 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20279:
---
Reviewer: Mirza Aliev

> Reordering of altering zone operations
> --
>
> Key: IGNITE-20279
> URL: https://issues.apache.org/jira/browse/IGNITE-20279
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue is shown in the test, where several zone change operations occur. 
> In this case, the operation can be reordered and incomplete at the end of the 
> test. There are messages "Received update for replicas number" in the test 
> log in a wrong order. The reproducer based on 
> ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the 
> test log:
> {code:java}
> @Test
> void testThreeQueuedRebalances() throws Exception {
> Node node = getNode(0);
> createZone(node, ZONE_NAME, 1, 1);
> createTable(node, ZONE_NAME, TABLE_NAME);
> assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
> 0).size() == 1, AWAIT_TIMEOUT_MILLIS));
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> waitPartitionAssignmentsSyncedToExpected(0, 2);
> checkPartitionNodes(0, 2);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations

2023-08-29 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20279:
---
Description: 
The issue is shown in the test, where several zone change operations occur. In 
this case, the operation can be reordered and incomplete at the end of the 
test. There are messages "Received update for replicas number" in the test log 
in a wrong order. The reproducer based on 
ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the test 
log:

{code:java}
@Test
void testThreeQueuedRebalances() throws Exception {
Node node = getNode(0);

createZone(node, ZONE_NAME, 1, 1);

createTable(node, ZONE_NAME, TABLE_NAME);

assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
0).size() == 1, AWAIT_TIMEOUT_MILLIS));

alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);

waitPartitionAssignmentsSyncedToExpected(0, 2);

checkPartitionNodes(0, 2);
}
{code}


  was:
The issue is shown in the test, where several zone change operations occur. On 
my laptop, the test ({{tRebalanceDistributedTest#testThreeQueuedRebalances}}) 
fails at least twice on 30 runs.
 # The first issue that I see is that the test does not wait to execute the 
last zone change operation: alterZone(node, ZONE_NAME, 2). In this case, the 
operation can be incomplete at the end of the test.
 # The second issue is that the next operation may start earlier than the 
previous one is completed.
{noformat}
2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor]
 Error occurred when processing a watch event
 org.apache.ignite.lang.IgniteInternalException: Raft group on the node is 
already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer 
[consistentId=irdt_ttqr_2, idx=0]]]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261)
 ~[main/:?]
at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259)
 ~[main/:?]
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
 ~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
{noformat}


> Reordering of altering zone operations
> --
>
> Key: IGNITE-20279
> URL: https://issues.apache.org/jira/browse/IGNITE-20279
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> The issue is shown in the test, where several zone change operations occur. 
> In this case, the operation can be reordered and incomplete at the end of the 
> test. There are messages "Received update for replicas number" in the test 
> log in a wrong order. The reproducer based on 
> ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the 
> test log:
> {code:java}
> @Test
> void testThreeQueuedRebalances() throws Exception {
> Node node = getNode(0);
> createZone(node, ZONE_NAME, 1, 1);
> createTable(node, ZONE_NAME, TABLE_NAME);
> assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
> 0).size() == 1, AWAIT_TIMEOUT_MILLIS));
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> alterZone(node, ZONE_NAME, 2);
> alterZone(node, ZONE_NAME, 3);
> 

[jira] [Created] (IGNITE-20303) "Raft group on the node is already started" exception when pending and planned assignment changed faster then rebalance

2023-08-29 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20303:
--

 Summary: "Raft group on the node is already started" exception 
when pending and planned assignment changed faster then rebalance
 Key: IGNITE-20303
 URL: https://issues.apache.org/jira/browse/IGNITE-20303
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Uttsel


If many changes of assignment are happened quickly then rebalance does not have 
time to be completed for each change. In this case exception is thrown:

{code:java}
2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor]
 Error occurred when processing a watch event
 org.apache.ignite.lang.IgniteInternalException: Raft group on the node is 
already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer 
[consistentId=irdt_ttqr_2, idx=0]]]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) 
~[main/:?]
at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361)
 ~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261)
 ~[main/:?]
at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) 
~[main/:?]
at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259)
 ~[main/:?]
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
 ~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
at java.lang.Thread.run(Thread.java:834) ~[?:?]
{code}

The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances. 
See exception in the test log:

{code:java}
@Test
void testThreeQueuedRebalances() throws Exception {
Node node = getNode(0);

createZone(node, ZONE_NAME, 1, 1);

createTable(node, ZONE_NAME, TABLE_NAME);

assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 
0).size() == 1, AWAIT_TIMEOUT_MILLIS));

alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);
alterZone(node, ZONE_NAME, 3);
alterZone(node, ZONE_NAME, 2);

waitPartitionAssignmentsSyncedToExpected(0, 2);

checkPartitionNodes(0, 2);
}
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20054) Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest

2023-08-24 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20054:
---
Reviewer: Mirza Aliev

> Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest
> -
>
> Key: IGNITE-20054
> URL: https://issues.apache.org/jira/browse/IGNITE-20054
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Motivation*
> After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented 
> some tests start to fail.
> For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
> to prevent data nodes updating in the meta storage. Then it check the data 
> nodes for the zone. But now dataNodes method returns nodes which even have 
> not written to the meta storage. Because dataNodes use augmentation map. So I 
> tried to fix this and similar tests by checking data nodes in metastorage, 
> but after that this tests are flaky.
> *Definition of Done*
> Fix and enabled tests from ItIgniteDistributionZoneManagerNodeRestartTest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20272) Clean up of DistributionZoneManagerWatchListenerTest

2023-08-24 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20272:
---
Reviewer: Mirza Aliev

> Clean up of DistributionZoneManagerWatchListenerTest
> 
>
> Key: IGNITE-20272
> URL: https://issues.apache.org/jira/browse/IGNITE-20272
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3, tech-debt
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> Ticket https://issues.apache.org/jira/browse/IGNITE-18564 was closed because 
> it is not actual anymore. But actually, the 
> DistributionZoneManagerWatchListenerTest that was mentioned in this ticket is 
> not actual. So we need to remove some tests in this class now and later 
> remove this class when ticket 
> https://issues.apache.org/jira/browse/IGNITE-19955 is implemented.
> DistributionZoneManagerWatchListenerTest has three tests:
> # The testStaleWatchEvent is disabled. It fails on an invoke into a 
> metastorage in which the logical topology and its version are updated. It 
> fails because the condition of the invoke was incorrect after some changes in 
> the code. But it is not necessary to use a condition there, I replaced it 
> with ms.putAll, and the test passed successfully. This test can be removed 
> because it repeats the 
> testScaleUpDidNotChangeDataNodesWhenTriggerKeyWasConcurrentlyChanged test.
> # testDataNodesUpdatedOnZoneManagerStart is the happy path of the restart, we 
> already have such tests. Therefore, this test is not needed and can be 
> removed.
> # testStaleVaultRevisionOnZoneManagerStart This test simulates that on the 
> zones manager restart, the data nodes for the zone will not be updated to the 
> value corresponding to the logical topology from the vault, because 
> zonesChangeTriggerKey > metaStorageManager.appliedRevision(). I have not 
> found an analog of this test. I think that when the DZM restart is updated 
> then we can update this test and move it to the more appropriate class.
> h3. *Definition of done*
> # testStaleWatchEvent and testDataNodesUpdatedOnZoneManagerStart are removed.
> # testStaleVaultRevisionOnZoneManagerStart marked by IGNITE-19955.
> # TODOs with IGNITE-18564 are removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20272) Clean up of DistributionZoneManagerWatchListenerTest

2023-08-23 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20272:
---
Issue Type: Improvement  (was: Bug)

> Clean up of DistributionZoneManagerWatchListenerTest
> 
>
> Key: IGNITE-20272
> URL: https://issues.apache.org/jira/browse/IGNITE-20272
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3, tech-debt
>
> h3. *Motivation*
> Ticket https://issues.apache.org/jira/browse/IGNITE-18564 was closed because 
> it is not actual anymore. But actually the 
> DistributionZoneManagerWatchListenerTest that was mentioned in this ticket is 
> not actual. So we need remove some tests in this class now and later remove 
> this class when ticket https://issues.apache.org/jira/browse/IGNITE-19955 
> will be implemented.
> DistributionZoneManagerWatchListenerTest has three tests:
> # The testStaleWatchEvent is disabled. It fails on an invoke into a 
> metastorage in which the logical topology and its version are updated. It 
> fails because the condition of the invoke was incorrect after some changes in 
> the code. But it is not necessary to use a condition there, I replaced it 
> with ms.putAll and the test passed successfully. This test can be removed 
> because it repeats the 
> testScaleUpDidNotChangeDataNodesWhenTriggerKeyWasConcurrentlyChanged test.
> # testDataNodesUpdatedOnZoneManagerStart is the happy path of restart, we 
> already have such tests. Therefore, this test is not needed and can be 
> removed.
> # testStaleVaultRevisionOnZoneManagerStart This test simulates that on the 
> zones manager restart, the data nodes for the zone will not be updated to the 
> value corresponding to the logical topology from the vault, because 
> zonesChangeTriggerKey > metaStorageManager.appliedRevision(). I have not 
> found a analogue of this test. I think that when the DZM restart is updated 
> then we can update this test and move it to more appropriate class.
> h3. *Definition of done*
> # testStaleWatchEvent and testDataNodesUpdatedOnZoneManagerStart are removed.
> # testStaleVaultRevisionOnZoneManagerStart marked by IGNITE-19955.
> # TODOs with IGNITE-18564 are removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20272) Clean up of DistributionZoneManagerWatchListenerTest

2023-08-23 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20272:
---
Description: 
h3. *Motivation*
Ticket https://issues.apache.org/jira/browse/IGNITE-18564 was closed because it 
is not actual anymore. But actually the 
DistributionZoneManagerWatchListenerTest that was mentioned in this ticket is 
not actual. So we need remove some tests in this class now and later remove 
this class when ticket https://issues.apache.org/jira/browse/IGNITE-19955 will 
be implemented.

DistributionZoneManagerWatchListenerTest has three tests:
# The testStaleWatchEvent is disabled. It fails on an invoke into a metastorage 
in which the logical topology and its version are updated. It fails because the 
condition of the invoke was incorrect after some changes in the code. But it is 
not necessary to use a condition there, I replaced it with ms.putAll and the 
test passed successfully. This test can be removed because it repeats the 
testScaleUpDidNotChangeDataNodesWhenTriggerKeyWasConcurrentlyChanged test.
# testDataNodesUpdatedOnZoneManagerStart is the happy path of restart, we 
already have such tests. Therefore, this test is not needed and can be removed.
# testStaleVaultRevisionOnZoneManagerStart This test simulates that on the 
zones manager restart, the data nodes for the zone will not be updated to the 
value corresponding to the logical topology from the vault, because 
zonesChangeTriggerKey > metaStorageManager.appliedRevision(). I have not found 
a analogue of this test. I think that when the DZM restart is updated then we 
can update this test and move it to more appropriate class.

h3. *Definition of done*
# testStaleWatchEvent and testDataNodesUpdatedOnZoneManagerStart are removed.
# testStaleVaultRevisionOnZoneManagerStart marked by IGNITE-19955.
# TODOs with IGNITE-18564 are removed.

  was:
h3. *Motivation*
Ticket https://issues.apache.org/jira/browse/IGNITE-18564 was closed because it 
is not actual anymore. But actually the 
DistributionZoneManagerWatchListenerTest that was mentioned in this ticket is 
not actual. So we need remove some tests in this class now and later remove 
this class when ticket https://issues.apache.org/jira/browse/IGNITE-19955 will 
be implemented.

DistributionZoneManagerWatchListenerTest has three tests:
# The testStaleWatchEvent is disabled. It fails on an invoke into a metastorage 
in which the logical topology and its version are updated. It fails because the 
condition of the invoke was incorrect after some changes in the code. But it is 
not necessary to use a condition there, I replaced it with ms.putAll and the 
test passed successfully. This test can be removed because it repeats the 
testScaleUpDidNotChangeDataNodesWhenTriggerKeyWasConcurrentlyChanged test.
# testDataNodesUpdatedOnZoneManagerStart is the happy path of restart, we 
already have such tests. Therefore, this test is not needed and can be removed.
# testStaleVaultRevisionOnZoneManagerStart This test simulates that on the 
zones manager restart, the data nodes for the zone will not be updated to the 
value corresponding to the logical topology from the vault, because 
zonesChangeTriggerKey > metaStorageManager.appliedRevision(). I have not found 
a analogue of this test. I think that when the DZM restart is updated then we 
can update this test and move it to more appropriate class.

h3. *Definition of done*
# testStaleWatchEvent and testDataNodesUpdatedOnZoneManagerStart are removed.
# testStaleVaultRevisionOnZoneManagerStart marked by IGNITE-19955.
# TOTOs with IGNITE-18564 are removed.


> Clean up of DistributionZoneManagerWatchListenerTest
> 
>
> Key: IGNITE-20272
> URL: https://issues.apache.org/jira/browse/IGNITE-20272
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3, tech-debt
>
> h3. *Motivation*
> Ticket https://issues.apache.org/jira/browse/IGNITE-18564 was closed because 
> it is not actual anymore. But actually the 
> DistributionZoneManagerWatchListenerTest that was mentioned in this ticket is 
> not actual. So we need remove some tests in this class now and later remove 
> this class when ticket https://issues.apache.org/jira/browse/IGNITE-19955 
> will be implemented.
> DistributionZoneManagerWatchListenerTest has three tests:
> # The testStaleWatchEvent is disabled. It fails on an invoke into a 
> metastorage in which the logical topology and its version are updated. It 
> fails because the condition of the invoke was incorrect after some changes in 
> the code. But it is not necessary to use a condition there, I replaced it 
> with ms.putAll and the test passed successfully. This test can be removed 
> because it repeats the 
> 

[jira] [Created] (IGNITE-20272) Clean up of DistributionZoneManagerWatchListenerTest

2023-08-23 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20272:
--

 Summary: Clean up of DistributionZoneManagerWatchListenerTest
 Key: IGNITE-20272
 URL: https://issues.apache.org/jira/browse/IGNITE-20272
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Uttsel


h3. *Motivation*
Ticket https://issues.apache.org/jira/browse/IGNITE-18564 was closed because it 
is not actual anymore. But actually the 
DistributionZoneManagerWatchListenerTest that was mentioned in this ticket is 
not actual. So we need remove some tests in this class now and later remove 
this class when ticket https://issues.apache.org/jira/browse/IGNITE-19955 will 
be implemented.

DistributionZoneManagerWatchListenerTest has three tests:
# The testStaleWatchEvent is disabled. It fails on an invoke into a metastorage 
in which the logical topology and its version are updated. It fails because the 
condition of the invoke was incorrect after some changes in the code. But it is 
not necessary to use a condition there, I replaced it with ms.putAll and the 
test passed successfully. This test can be removed because it repeats the 
testScaleUpDidNotChangeDataNodesWhenTriggerKeyWasConcurrentlyChanged test.
# testDataNodesUpdatedOnZoneManagerStart is the happy path of restart, we 
already have such tests. Therefore, this test is not needed and can be removed.
# testStaleVaultRevisionOnZoneManagerStart This test simulates that on the 
zones manager restart, the data nodes for the zone will not be updated to the 
value corresponding to the logical topology from the vault, because 
zonesChangeTriggerKey > metaStorageManager.appliedRevision(). I have not found 
a analogue of this test. I think that when the DZM restart is updated then we 
can update this test and move it to more appropriate class.

h3. *Definition of done*
# testStaleWatchEvent and testDataNodesUpdatedOnZoneManagerStart are removed.
# testStaleVaultRevisionOnZoneManagerStart marked by IGNITE-19955.
# TOTOs with IGNITE-18564 are removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20050) Clean CausalityDataNodesEngine#zonesVersionedCfg which stores zones' configuration changes

2023-08-21 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20050:
---
Description: 
*Motivation*

CausalityDataNodesEngine#zonesVersionedCfg contains zones' configuration 
changes. It updates with revision and configuration event on a zone creation, a 
scale up update, a scale up update and so on. But this map does not remove old 
values. We need to keep a history of changes to some depth.

The easiest way to clear the zonesVersionedCfg is to do it on the meta storage 
compaction. For this purpose need to create notification about compaction and 
clear older configurations in the zonesVersionedCfg except of the last one.
Another approach is to investigate all current and potential future usages of 
dataNodes to find out when we can clear the zonesVersionedCfg. There are cases 
when we can request the date nodes value with the same token several times over 
an arbitrary period of time. For example before and after the rebalance. But 
the dataNodes method reads data from the meta storage so we also need to keep 
dataNodes and other keys in the meta storage. Therefore, this approach also 
depends on the meta storage compaction.

*Definition of Done*
 # Find out how deep the history of changes needs to be stored.
 # Remove old values.

  was:
*Motivation*

CausalityDataNodesEngine#zonesVersionedCfg contains zones' configuration 
changes. It updates with revision and configuration event on a zone creation, a 
scale up update, a scale up update and so on. But this map does not remove old 
values. We need to keep a history of changes to some depth.

*Definition of Done*
 # Find out how deep the history of changes needs to be stored.
 # Remove old values.


> Clean CausalityDataNodesEngine#zonesVersionedCfg which stores zones' 
> configuration changes
> --
>
> Key: IGNITE-20050
> URL: https://issues.apache.org/jira/browse/IGNITE-20050
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> CausalityDataNodesEngine#zonesVersionedCfg contains zones' configuration 
> changes. It updates with revision and configuration event on a zone creation, 
> a scale up update, a scale up update and so on. But this map does not remove 
> old values. We need to keep a history of changes to some depth.
> The easiest way to clear the zonesVersionedCfg is to do it on the meta 
> storage compaction. For this purpose need to create notification about 
> compaction and clear older configurations in the zonesVersionedCfg except of 
> the last one.
> Another approach is to investigate all current and potential future usages of 
> dataNodes to find out when we can clear the zonesVersionedCfg. There are 
> cases when we can request the date nodes value with the same token several 
> times over an arbitrary period of time. For example before and after the 
> rebalance. But the dataNodes method reads data from the meta storage so we 
> also need to keep dataNodes and other keys in the meta storage. Therefore, 
> this approach also depends on the meta storage compaction.
> *Definition of Done*
>  # Find out how deep the history of changes needs to be stored.
>  # Remove old values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-19403) Watch listeners must be deployed after the zone manager starts

2023-08-04 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel resolved IGNITE-19403.

  Assignee: Sergey Uttsel
Resolution: Fixed

> Watch listeners must be deployed after the zone manager starts
> --
>
> Key: IGNITE-19403
> URL: https://issues.apache.org/jira/browse/IGNITE-19403
> Project: Ignite
>  Issue Type: Test
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3, tech-debt
>
> h3. *Motivation*
> A method 
> {{DistributionZonesTestUtil#deployWatchesAndUpdateMetaStorageRevision}} is 
> used in tests related to a distribution zone manager to increase a meta 
> storage applied revision before distribution manager starts. The method 
> breaks invariant: zone manager must be started before 
> metaStorageManager.deployWatches() is invoked. Need to do proper solution for 
> increasing the applied revision.
>  
> The first approach to fix it is to invoke methods in this order
>  
> {code:java}
> vaultManager.put(new ByteArray("applied_revision"), longToBytes(1)).get();
> metaStorageManager.start();
> distributionZoneManager.start();
> metaStorageManager.deployWatches();
> {code}
>  
> First we put applied_revision. Then start metaStorageManager and 
> metaStorageManager. Then deploy watches.
> The disadvantage of this method is that th ByteArray("applied_revision") is 
> an internal part of the implementation.
>  
> The second way is a restart all of the components used in the test to 
> simulate the node restart. In this case, after the zones manager's restart, 
> the revision will be greater than zero.
> h3. *Definition of Done*
> The {{deployWatchesAndUpdateMetaStorageRevision}} is replaced by proper 
> solution. Need to try approach with restart of all components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20058) NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter

2023-08-03 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20058:
---
Description: 
*{{Motivation}}*

{{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
very low failure rate it fails with NPE (1 fail in 1500 runs)
{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}
{code:java}
2023-08-01 15:55:40:440 +0300 
[INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
notify configuration listener
java.lang.NullPointerException
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
    at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
    at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source){code}
 
*Implementation Notes*
The reason is the wrong start order of the components:
# Firstly metastorage watch listeners are deployed.
# Then DistributionZoneManager is started.

So I change this order to fix the issue.

Also I will close https://issues.apache.org/jira/browse/IGNITE-19403  when this 
ticket will be closed.

  was:
*{{Motivation}}*

{{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
very low failure rate it fails with NPE (1 fail in 1500 runs)
{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}
{code:java}
2023-08-01 15:55:40:440 +0300 
[INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
notify configuration listener
java.lang.NullPointerException
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
    at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
    at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source){code}
 
*Implementation Notes*
The reason is the wrong start order of the 

[jira] [Updated] (IGNITE-20058) NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter

2023-08-03 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20058:
---
Description: 
*{{Motivation}}*

{{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
very low failure rate it fails with NPE (1 fail in 1500 runs)
{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}
{code:java}
2023-08-01 15:55:40:440 +0300 
[INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
notify configuration listener
java.lang.NullPointerException
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
    at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
    at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source){code}
 
*Implementation Notes*
The reason is the wrong start order of the components:
# Firstly metastorage watch listeners are deployed.
# Then DistributionZoneManager is started.
So I change this order to fix the issue.

Also I will close https://issues.apache.org/jira/browse/IGNITE-19403  when this 
ticket will be closed.

  was:
{{MotivationDistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky 
and with very low failure rate it fails with NPE (1 fail in 1500 runs)
{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}
{code:java}
2023-08-01 15:55:40:440 +0300 
[INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
notify configuration listener
java.lang.NullPointerException
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
    at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
    at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source){code}


> NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter
> 

[jira] [Updated] (IGNITE-20058) NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter

2023-08-03 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20058:
---
Description: 
{{MotivationDistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky 
and with very low failure rate it fails with NPE (1 fail in 1500 runs)
{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}
{code:java}
2023-08-01 15:55:40:440 +0300 
[INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
notify configuration listener
java.lang.NullPointerException
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
    at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
    at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source){code}

  was:
{{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
very low failure rate it fails with NPE (1 fail in 1500 runs)
{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}
{code:java}
2023-08-01 15:55:40:440 +0300 
[INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
notify configuration listener
java.lang.NullPointerException
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
    at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
    at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source){code}


> NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter
> -
>
> Key: IGNITE-20058
> URL: https://issues.apache.org/jira/browse/IGNITE-20058
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Sergey Uttsel
>Priority: Major
>  

[jira] [Assigned] (IGNITE-20058) NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter

2023-08-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel reassigned IGNITE-20058:
--

Assignee: Sergey Uttsel  (was: Alexander Lapin)

> NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter
> -
>
> Key: IGNITE-20058
> URL: https://issues.apache.org/jira/browse/IGNITE-20058
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> {{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
> very low failure rate it fails with NPE (1 fail in 1500 runs)
> {noformat}
> 2023-07-25 16:48:30:520 +0400 
> [ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred 
> when processing a watch event
> java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
>   at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
>   at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
>   at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
>   at 
> org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
>  Source)
> {noformat}
> {code:java}
> 2023-08-01 15:55:40:440 +0300 
> [INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
> notify configuration listener
> java.lang.NullPointerException
>     at 
> org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
>     at 
> org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
>     at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
>     at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
>     at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
>     at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
>     at 
> org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
>  Source){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20058) NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter

2023-08-01 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20058:
---
Description: 
{{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
very low failure rate it fails with NPE (1 fail in 1500 runs)
{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}
{code:java}
2023-08-01 15:55:40:440 +0300 
[INFO][%test%metastorage-watch-executor-1][ConfigurationRegistry] Failed to 
notify configuration listener
java.lang.NullPointerException
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.updateZoneConfiguration(CausalityDataNodesEngine.java:570)
    at 
org.apache.ignite.internal.distributionzones.causalitydatanodes.CausalityDataNodesEngine.onUpdateFilter(CausalityDataNodesEngine.java:557)
    at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateFilter$18(DistributionZoneManager.java:774)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
    at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
    at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source){code}

  was:
{{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
very low failure rate it fails with NPE (1 fail in 1500 runs)


{noformat}
2023-07-25 16:48:30:520 +0400 
[ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred when 
processing a watch event
java.lang.NullPointerException
at 
org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
at 
org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
at 
org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
 Source)
{noformat}



> NPE in DistributionZoneManagerAlterFilterTest#testAlterFilter
> -
>
> Key: IGNITE-20058
> URL: https://issues.apache.org/jira/browse/IGNITE-20058
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> {{DistributionZoneManagerAlterFilterTest.testAlterFilter}} is flaky and with 
> very low failure rate it fails with NPE (1 fail in 1500 runs)
> {noformat}
> 2023-07-25 16:48:30:520 +0400 
> [ERROR][%test%metastorage-watch-executor-0][WatchProcessor] Error occurred 
> when processing a watch event
> java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.distributionzones.DistributionZoneManager.lambda$onUpdateScaleDown$18(DistributionZoneManager.java:737)
>   at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier.notifyPublicListeners(ConfigurationNotifier.java:488)
>   at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:136)
>   at 
> org.apache.ignite.internal.configuration.notifications.ConfigurationNotifier$1.visitLeafNode(ConfigurationNotifier.java:129)
>   at 
> org.apache.ignite.internal.distributionzones.configuration.DistributionZoneNode.traverseChildren(Unknown
>  Source)
> {noformat}
> {code:java}
> 2023-08-01 15:55:40:440 +0300 
> 

[jira] [Updated] (IGNITE-20050) Clean CausalityDataNodesEngine#zonesVersionedCfg which stores zones' configuration changes

2023-07-31 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20050:
---
Epic Link: IGNITE-19743

> Clean CausalityDataNodesEngine#zonesVersionedCfg which stores zones' 
> configuration changes
> --
>
> Key: IGNITE-20050
> URL: https://issues.apache.org/jira/browse/IGNITE-20050
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> CausalityDataNodesEngine#zonesVersionedCfg contains zones' configuration 
> changes. It updates with revision and configuration event on a zone creation, 
> a scale up update, a scale up update and so on. But this map does not remove 
> old values. We need to keep a history of changes to some depth.
> *Definition of Done*
>  # Find out how deep the history of changes needs to be stored.
>  # Remove old values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19506) Use data nodes from DistributionZoneManager with a causality token instead of BaselineManager#nodes

2023-07-31 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19506:
---
Description: 
h3. *Motivation*

Need to use data nodes from DistributionZoneManager instead of 
BaselineManager#nodes in all places except of in-memory raft 
(TableManager#calculateAssignments)

We need to get data nodes consistently so we need to use revision of 
configuration events and a meta storage events as causality token.

Description of causality data nodes algorithm is attached.
h3. *Definition of Done*

Implement method DistributionZoneManager#dataNodes to obtaining data nodes from 
zone manager with causality token.

Use this method instead of BaselineManager#nodes.

  was:
h3. *Motivation*

Need to use data nodes from DistributionZoneManager instead of 
BaselineManager#nodes in:
 # DistributionZoneRebalanceEngine#onUpdateReplicas
 # TableManager#createAssignmentsSwitchRebalanceListener

We need to get data nodes consistently so we need to use revision of 
configuration events and a meta storage events as causality token. Also need to 
use VersionedValue to save data nodes with causality token.

Description of causality data nodes algorithm is attached.
h3. *Definition of Done*

DistributionZoneRebalanceEngine#onUpdateReplicas and 
TableManager#createAssignmentsSwitchRebalanceListener use data nodes from 
DistributionZoneManager


> Use data nodes from DistributionZoneManager with a causality token instead of 
> BaselineManager#nodes
> ---
>
> Key: IGNITE-19506
> URL: https://issues.apache.org/jira/browse/IGNITE-19506
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
> Attachments: Causality data nodes.docx
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> Need to use data nodes from DistributionZoneManager instead of 
> BaselineManager#nodes in all places except of in-memory raft 
> (TableManager#calculateAssignments)
> We need to get data nodes consistently so we need to use revision of 
> configuration events and a meta storage events as causality token.
> Description of causality data nodes algorithm is attached.
> h3. *Definition of Done*
> Implement method DistributionZoneManager#dataNodes to obtaining data nodes 
> from zone manager with causality token.
> Use this method instead of BaselineManager#nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20054) Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest

2023-07-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20054:
---
Epic Link: IGNITE-19743

> Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest
> -
>
> Key: IGNITE-20054
> URL: https://issues.apache.org/jira/browse/IGNITE-20054
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented 
> some tests start to fail.
> For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
> to prevent data nodes updating in the meta storage. Then it check the data 
> nodes for the zone. But now dataNodes method returns nodes which even have 
> not written to the meta storage. Because dataNodes use augmentation map. So I 
> tried to fix this and similar tests by checking data nodes in metastorage, 
> but after that this tests are flaky.
> *Definition of Done*
> Fix and enabled tests from ItIgniteDistributionZoneManagerNodeRestartTest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20053) Empty data nodes are returned by data nodes engine

2023-07-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20053:
---
Epic Link: IGNITE-19743

> Empty data nodes are returned by data nodes engine
> --
>
> Key: IGNITE-20053
> URL: https://issues.apache.org/jira/browse/IGNITE-20053
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> There is a meta storage key called DISTRIBUTION_ZONES_LOGICAL_TOPOLOGY_KEY 
> and it is refreshed by topology listener on topology events and stores 
> logical topology. If the value stored by this key is null, then empty data 
> nodes are returned from data nodes engine on data nodes calculation for a 
> distribution zone. As a result, empty assignments are calculated for 
> partitions, which leads to exception described in IGNITE-19466.
> Some integration tests, for example, ItRebalanceDistributedTest are flaky 
> because of possible problems with value of 
> DISTRIBUTION_ZONES_LOGICAL_TOPOLOGY_KEY and empty data nodes calculated by 
> data nodes engine.
> Actually, the empty data nodes collection is a wrong result for this case 
> because the current logical topology is not empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20054) Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest

2023-07-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20054:
---
Description: 
*Motivation*

After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented some 
tests start to fail.

For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
to prevent data nodes updating in the meta storage. Then it check the data 
nodes for the zone. But now dataNodes method returns nodes which even have not 
written to the meta storage. Because dataNodes use augmentation map. So I tried 
to fix this and similar tests by checking data nodes in metastorage, but after 
that this tests are flaky.

*Definition of Done*

Fix and enabled tests from ItIgniteDistributionZoneManagerNodeRestartTest.

  was:
*Motivation*

After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented some 
tests start to fail.

For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
to prevent data nodes updating in the meta storage. Then it check the data 
nodes for the zone. But now dataNodes method returns nodes which even have not 
written to the meta storage. Because dataNodes use augmentation map. So I tried 
to fix this and similar tests by checking data nodes in metastorage, but after 
that this tests are flaky.

*Definition of Done*

Fix and enabled tests from 


> Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest
> -
>
> Key: IGNITE-20054
> URL: https://issues.apache.org/jira/browse/IGNITE-20054
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented 
> some tests start to fail.
> For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
> to prevent data nodes updating in the meta storage. Then it check the data 
> nodes for the zone. But now dataNodes method returns nodes which even have 
> not written to the meta storage. Because dataNodes use augmentation map. So I 
> tried to fix this and similar tests by checking data nodes in metastorage, 
> but after that this tests are flaky.
> *Definition of Done*
> Fix and enabled tests from ItIgniteDistributionZoneManagerNodeRestartTest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20054) Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest

2023-07-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20054:
---
Description: 
*Motivation*

After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented some 
tests start to fail.

For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
to prevent data nodes updating in the meta storage. Then it check the data 
nodes for the zone. But now dataNodes method returns nodes which even have not 
written to the meta storage. Because dataNodes use augmentation map. So I tried 
to fix this and similar tests by checking data nodes in metastorage, but after 
that this tests are flaky.

*Definition of Done*

Fix and enabled tests from 

  was:
*Motivation*

After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented some 
tests start to fail.

For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
to prevent data nodes updating in the meta storage. Then it check the data 
nodes for the zone. But now dataNodes method returns nodes which even have not 
written to the meta storage. Because dataNodes use augmentation map. So I tried 
to fix this and similar tests by checking data nodes in metastorage, but after 
that this tests are flaky.

*Definition of Done*


> Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest
> -
>
> Key: IGNITE-20054
> URL: https://issues.apache.org/jira/browse/IGNITE-20054
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented 
> some tests start to fail.
> For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
> to prevent data nodes updating in the meta storage. Then it check the data 
> nodes for the zone. But now dataNodes method returns nodes which even have 
> not written to the meta storage. Because dataNodes use augmentation map. So I 
> tried to fix this and similar tests by checking data nodes in metastorage, 
> but after that this tests are flaky.
> *Definition of Done*
> Fix and enabled tests from 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-20054) Flaky tests in ItIgniteDistributionZoneManagerNodeRestartTest

2023-07-26 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20054:
--

 Summary: Flaky tests in 
ItIgniteDistributionZoneManagerNodeRestartTest
 Key: IGNITE-20054
 URL: https://issues.apache.org/jira/browse/IGNITE-20054
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Uttsel


*Motivation*

After https://issues.apache.org/jira/browse/IGNITE-19506  was implemented some 
tests start to fail.

For example the test testScaleUpTimerIsRestoredAfterRestart use `blockUpdate` 
to prevent data nodes updating in the meta storage. Then it check the data 
nodes for the zone. But now dataNodes method returns nodes which even have not 
written to the meta storage. Because dataNodes use augmentation map. So I tried 
to fix this and similar tests by checking data nodes in metastorage, but after 
that this tests are flaky.

*Definition of Done*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20050) Clean CausalityDataNodesEngine#zonesVersionedCfg which stores zones' configuration changes

2023-07-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-20050:
---
Summary: Clean CausalityDataNodesEngine#zonesVersionedCfg which stores 
zones' configuration changes  (was: Clean 
CausalityDataNodesEngine#zonesVersionedCfg)

> Clean CausalityDataNodesEngine#zonesVersionedCfg which stores zones' 
> configuration changes
> --
>
> Key: IGNITE-20050
> URL: https://issues.apache.org/jira/browse/IGNITE-20050
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> CausalityDataNodesEngine#zonesVersionedCfg contains zones' configuration 
> changes. It updates with revision and configuration event on a zone creation, 
> a scale up update, a scale up update and so on. But this map does not remove 
> old values. We need to keep a history of changes to some depth.
> *Definition of Done*
>  # Find out how deep the history of changes needs to be stored.
>  # Remove old values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-20050) Clean CausalityDataNodesEngine#zonesVersionedCfg

2023-07-26 Thread Sergey Uttsel (Jira)
Sergey Uttsel created IGNITE-20050:
--

 Summary: Clean CausalityDataNodesEngine#zonesVersionedCfg
 Key: IGNITE-20050
 URL: https://issues.apache.org/jira/browse/IGNITE-20050
 Project: Ignite
  Issue Type: Improvement
Reporter: Sergey Uttsel


*Motivation*

CausalityDataNodesEngine#zonesVersionedCfg contains zones' configuration 
changes. It updates with revision and configuration event on a zone creation, a 
scale up update, a scale up update and so on. But this map does not remove old 
values. We need to keep a history of changes to some depth.

*Definition of Done*
 # Find out how deep the history of changes needs to be stored.
 # Remove old values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-19507) [TC Bot] Doesn't send messages to Slack

2023-07-03 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel resolved IGNITE-19507.

Resolution: Not A Problem

> [TC Bot] Doesn't send messages to Slack
> ---
>
> Key: IGNITE-19507
> URL: https://issues.apache.org/jira/browse/IGNITE-19507
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>
> TC Bot doesn't send messages to Slack. For example:
>  * Open [https://mtcga.gridgain.com/monitoring.html]
>  * Press "Send" button for Test Slack notification.
>  * {*}Expected{*}: new message in Slack chat.
>  * {*}Actual{*}: no messages.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-19507) [TC Bot] Doesn't send messages to Slack

2023-06-30 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-19507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739075#comment-17739075
 ] 

Sergey Uttsel edited comment on IGNITE-19507 at 6/30/23 1:16 PM:
-

The reason is that *TC Bot account* was not added to *tc_green_again* chat. 
Earlier, it was also not added to the chat, but now it may be necessary for it 
to work.


was (Author: sergey uttsel):
The reason is that *mtcga* was not added to *tc_green_again* chat. Earlier, it 
was also not added to the chat, but now it may be necessary for it to work.

> [TC Bot] Doesn't send messages to Slack
> ---
>
> Key: IGNITE-19507
> URL: https://issues.apache.org/jira/browse/IGNITE-19507
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>
> TC Bot doesn't send messages to Slack. For example:
>  * Open [https://mtcga.gridgain.com/monitoring.html]
>  * Press "Send" button for Test Slack notification.
>  * {*}Expected{*}: new message in Slack chat.
>  * {*}Actual{*}: no messages.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-19507) [TC Bot] Doesn't send messages to Slack

2023-06-30 Thread Sergey Uttsel (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-19507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739075#comment-17739075
 ] 

Sergey Uttsel commented on IGNITE-19507:


The reason is that *mtcga* was not added to *tc_green_again* chat. Earlier, it 
was also not added to the chat, but now it may be necessary for it to work.

> [TC Bot] Doesn't send messages to Slack
> ---
>
> Key: IGNITE-19507
> URL: https://issues.apache.org/jira/browse/IGNITE-19507
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>
> TC Bot doesn't send messages to Slack. For example:
>  * Open [https://mtcga.gridgain.com/monitoring.html]
>  * Press "Send" button for Test Slack notification.
>  * {*}Expected{*}: new message in Slack chat.
>  * {*}Actual{*}: no messages.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19783) StripedScheduledExecutorService for DistributionZoneManager#executor

2023-06-27 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19783:
---
Description: 
h3. *Motivation*

In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 for 
DistributionZoneManager#executor to ensure that all data nodes calculation 
tasks per a zone are executed in order of creation. But we need more threads to 
process this tasks. So we need to create StripedScheduledExecutorService and 
all tasks for the same zone must be executed in one stripe. The pool to execute 
the task is defined by a zone id.
h3. *Definition of Done*
 # StripedScheduledExecutorService is created and used instead of single thread 
executor in DistributionZoneManager.
 # All tasks for the same zone must be executed in one stripe.

h3. *Implementation Notes*

I've created a draft StripedScheduledExecutorService in a branch 
[https://github.com/gridgain/apache-ignite-3/tree/ignite-19783]

  was:
h3. *Motivation*

In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 for 
DistributionZoneManager#executor to ensure that all data nodes calculation 
tasks per a zone are executed in order of creation. But we need more threads to 
process this tasks. So we need to create StripedScheduledExecutorService and 
all tasks for the same zone must be executed in one stripe. The pool to execute 
the task is defined by a zone id.
h3. *Definition of Done*
 # StripedScheduledExecutorService is created and used instead of single thread 
executor in DistributionZoneManager.
 # All tasks for the same zone must be executed in one stripe.

h3. *Implementation Notes*

I've created  [https://github.com/gridgain/apache-ignite-3/tree/ignite-19783]


> StripedScheduledExecutorService for DistributionZoneManager#executor
> 
>
> Key: IGNITE-19783
> URL: https://issues.apache.org/jira/browse/IGNITE-19783
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 
> for DistributionZoneManager#executor to ensure that all data nodes 
> calculation tasks per a zone are executed in order of creation. But we need 
> more threads to process this tasks. So we need to create 
> StripedScheduledExecutorService and all tasks for the same zone must be 
> executed in one stripe. The pool to execute the task is defined by a zone id.
> h3. *Definition of Done*
>  # StripedScheduledExecutorService is created and used instead of single 
> thread executor in DistributionZoneManager.
>  # All tasks for the same zone must be executed in one stripe.
> h3. *Implementation Notes*
> I've created a draft StripedScheduledExecutorService in a branch 
> [https://github.com/gridgain/apache-ignite-3/tree/ignite-19783]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19783) StripedScheduledExecutorService for DistributionZoneManager#executor

2023-06-27 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19783:
---
Description: 
h3. *Motivation*

In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 for 
DistributionZoneManager#executor to ensure that all data nodes calculation 
tasks per a zone are executed in order of creation. But we need more threads to 
process this tasks. So we need to create StripedScheduledExecutorService and 
all tasks for the same zone must be executed in one stripe. The pool to execute 
the task is defined by a zone id.
h3. *Definition of Done*
 # StripedScheduledExecutorService is created and used instead of single thread 
executor in DistributionZoneManager.
 # All tasks for the same zone must be executed in one stripe.

h3. *Implementation Notes*

I've created  [https://github.com/gridgain/apache-ignite-3/tree/ignite-19783]

  was:
h3. *Motivation*

In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 for 
DistributionZoneManager#executor to ensure that all data nodes calculation 
tasks per a zone are executed in order of creation. But we need more threads to 
process this tasks. So we need to create StripedScheduledExecutorService and 
all tasks for the same zone must be executed in one stripe. The pool to execute 
the task is defined by a zone id.
h3. *Definition of Done*
 # StripedScheduledExecutorService is created and used instead of single thread 
executor in DistributionZoneManager.
 # All tasks for the same zone must be executed in one stripe.


> StripedScheduledExecutorService for DistributionZoneManager#executor
> 
>
> Key: IGNITE-19783
> URL: https://issues.apache.org/jira/browse/IGNITE-19783
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 
> for DistributionZoneManager#executor to ensure that all data nodes 
> calculation tasks per a zone are executed in order of creation. But we need 
> more threads to process this tasks. So we need to create 
> StripedScheduledExecutorService and all tasks for the same zone must be 
> executed in one stripe. The pool to execute the task is defined by a zone id.
> h3. *Definition of Done*
>  # StripedScheduledExecutorService is created and used instead of single 
> thread executor in DistributionZoneManager.
>  # All tasks for the same zone must be executed in one stripe.
> h3. *Implementation Notes*
> I've created  [https://github.com/gridgain/apache-ignite-3/tree/ignite-19783]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create implementation of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Description: 
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need:
 # create implementation of MetaStorageManager interface for interaction with 
local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.
 # Create method `MetaStorageManager local()` in MetaStorageManager.
 # For MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
 # For LocalMetaStorageManagerImpl it will throw UnsupportedOperationException.
 # Methods in LocalMetaStorageManagerImpl which cannot work will the local meta 
storage (for example put, invoke) must throw UnsupportedOperationException.
 # Remove method MetaStorageManager#getLocally(byte[] key, long revLowerBound, 
long revUpperBound) and create MetaStorageManager#get(byte[] key, long 
revLowerBound, long revUpperBound). The behavior of this method will be depend 
implementation (distributed or local).

h3. *Definition of Done*
 # Create implementation of MetaStorageManager for interaction with the local 
meta storage
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.

  was:
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need:
 # create implementation of MetaStorageManager interface for interaction with 
local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.
 # Create method `MetaStorageManager local()` in MetaStorageManager.
 # For MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
 # For LocalMetaStorageManagerImpl it will throw UnsupportedOperationException.
 # Methods in LocalMetaStorageManagerImpl which cannot work will the local meta 
storage (for example put, invoke) must throw UnsupportedOperationException.

h3. *Definition of Done*
 # Create implementation of MetaStorageManager for interaction with the local 
meta storage
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.


> Create implementation of MetaStorageManager for interaction with the local 
> meta storage
> ---
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getLocally method to retrieve entries from 
> the local KeyValueStorage. There will be more such methods. So we need:
>  # create implementation of MetaStorageManager interface for interaction with 
> local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.
>  # Create method `MetaStorageManager local()` in MetaStorageManager.
>  # For MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
>  # For LocalMetaStorageManagerImpl it will throw 
> UnsupportedOperationException.
>  # Methods in LocalMetaStorageManagerImpl which cannot work will the local 
> meta storage (for example put, invoke) must throw 
> UnsupportedOperationException.
>  # Remove method MetaStorageManager#getLocally(byte[] key, long 
> revLowerBound, long revUpperBound) and create MetaStorageManager#get(byte[] 
> key, long revLowerBound, long revUpperBound). The behavior of this method 
> will be depend implementation (distributed or local).
> h3. *Definition of Done*
>  # Create implementation of MetaStorageManager for interaction with the local 
> meta storage
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create implementation of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Description: 
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need:
 # create implementation of MetaStorageManager interface for interaction with 
local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.
 # Create method `MetaStorageManager local()` in MetaStorageManager.
 # For MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
 # For LocalMetaStorageManagerImpl it will throw UnsupportedOperationException.
 # Methods in LocalMetaStorageManagerImpl which cannot work will the local meta 
storage (for example put, invoke) must throw UnsupportedOperationException.

h3. *Definition of Done*
 # Create implementation of MetaStorageManager for interaction with the local 
meta storage
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.

  was:
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need:

create implementation of MetaStorageManager interface for interaction with 
local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.

create method `MetaStorageManager local()` in MetaStorageManager.

For 
MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
For LocalMetaStorageManagerImpl it will throw UnsupportedOperationException.
Methods in LocalMetaStorageManagerImpl which cannot work will the local meta 
storage must throw UnsupportedOperationException.
h3. *Definition of Done*
 # Created new interface for interaction with local KeyValueStorage.
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.


> Create implementation of MetaStorageManager for interaction with the local 
> meta storage
> ---
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getLocally method to retrieve entries from 
> the local KeyValueStorage. There will be more such methods. So we need:
>  # create implementation of MetaStorageManager interface for interaction with 
> local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.
>  # Create method `MetaStorageManager local()` in MetaStorageManager.
>  # For MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
>  # For LocalMetaStorageManagerImpl it will throw 
> UnsupportedOperationException.
>  # Methods in LocalMetaStorageManagerImpl which cannot work will the local 
> meta storage (for example put, invoke) must throw 
> UnsupportedOperationException.
> h3. *Definition of Done*
>  # Create implementation of MetaStorageManager for interaction with the local 
> meta storage
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create implementation of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Description: 
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need:

create implementation of MetaStorageManager interface for interaction with 
local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.

create method `MetaStorageManager local()` in MetaStorageManager.

For 
MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
For LocalMetaStorageManagerImpl it will throw UnsupportedOperationException.
Methods in LocalMetaStorageManagerImpl which cannot work will the local meta 
storage must throw UnsupportedOperationException.
h3. *Definition of Done*
 # Created new interface for interaction with local KeyValueStorage.
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.

  was:
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need to create 
implementation interface for interaction with local KeyValueStorage.
h3. *Definition of Done*
 # Created new interface for interaction with local KeyValueStorage.
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.


> Create implementation of MetaStorageManager for interaction with the local 
> meta storage
> ---
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getLocally method to retrieve entries from 
> the local KeyValueStorage. There will be more such methods. So we need:
> create implementation of MetaStorageManager interface for interaction with 
> local KeyValueStorage. Named it for example LocalMetaStorageManagerImpl.
> create method `MetaStorageManager local()` in MetaStorageManager.
> For 
> MetaStorageManagerImpl it will return LocalMetaStorageManagerImpl
> For LocalMetaStorageManagerImpl it will throw UnsupportedOperationException.
> Methods in LocalMetaStorageManagerImpl which cannot work will the local meta 
> storage must throw UnsupportedOperationException.
> h3. *Definition of Done*
>  # Created new interface for interaction with local KeyValueStorage.
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create implementation of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Summary: Create implementation of MetaStorageManager for interaction with 
the local meta storage  (was: Create proxy of MetaStorageManager for 
interaction with the local meta storage)

> Create implementation of MetaStorageManager for interaction with the local 
> meta storage
> ---
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getLocally method to retrieve entries from 
> the local KeyValueStorage. There will be more such methods. So we need to 
> create implementation interface for interaction with local KeyValueStorage.
> h3. *Definition of Done*
>  # Created new interface for interaction with local KeyValueStorage.
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create proxy of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Description: 
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need to create 
implementation interface for interaction with local KeyValueStorage.
h3. *Definition of Done*
 # Created new interface for interaction with local KeyValueStorage.
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.

  was:
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need to create 
separated interface for interaction with local KeyValueStorage.
h3. *Definition of Done*
 # Created new interface for interaction with local KeyValueStorage.
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.


> Create proxy of MetaStorageManager for interaction with the local meta storage
> --
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getLocally method to retrieve entries from 
> the local KeyValueStorage. There will be more such methods. So we need to 
> create implementation interface for interaction with local KeyValueStorage.
> h3. *Definition of Done*
>  # Created new interface for interaction with local KeyValueStorage.
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create implementation of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Description: 
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getLocally method to retrieve entries from the local 
KeyValueStorage. There will be more such methods. So we need to create 
separated interface for interaction with local KeyValueStorage.
h3. *Definition of Done*
 # Created new interface for interaction with local KeyValueStorage.
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.

  was:
h3. *Motivation*

MetaStorageManager has methods for distributive interaction with meta storage. 
But now there is added getEntriesLocally method to retrieve entries from the 
local KeyValueStorage. There will be more such methods. So we need to create 
separated interface for interaction with local KeyValueStorage.
h3. *Definition of Done*
 # Created new interface for interaction with local KeyValueStorage.
 # MetaStorageManager has a method to get instance of local KeyValueStorage 
interface.


> Create implementation of MetaStorageManager for interaction with the local 
> meta storage
> ---
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getLocally method to retrieve entries from 
> the local KeyValueStorage. There will be more such methods. So we need to 
> create separated interface for interaction with local KeyValueStorage.
> h3. *Definition of Done*
>  # Created new interface for interaction with local KeyValueStorage.
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create proxy of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Summary: Create proxy of MetaStorageManager for interaction with the local 
meta storage  (was: Create implementation of MetaStorageManager for interaction 
with the local meta storage)

> Create proxy of MetaStorageManager for interaction with the local meta storage
> --
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getLocally method to retrieve entries from 
> the local KeyValueStorage. There will be more such methods. So we need to 
> create separated interface for interaction with local KeyValueStorage.
> h3. *Definition of Done*
>  # Created new interface for interaction with local KeyValueStorage.
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19735) Create implementation of MetaStorageManager for interaction with the local meta storage

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19735:
---
Summary: Create implementation of MetaStorageManager for interaction with 
the local meta storage  (was: Create interface for interaction with local 
KeyValueStorage of the meta storage)

> Create implementation of MetaStorageManager for interaction with the local 
> meta storage
> ---
>
> Key: IGNITE-19735
> URL: https://issues.apache.org/jira/browse/IGNITE-19735
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Assignee: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. *Motivation*
> MetaStorageManager has methods for distributive interaction with meta 
> storage. But now there is added getEntriesLocally method to retrieve entries 
> from the local KeyValueStorage. There will be more such methods. So we need 
> to create separated interface for interaction with local KeyValueStorage.
> h3. *Definition of Done*
>  # Created new interface for interaction with local KeyValueStorage.
>  # MetaStorageManager has a method to get instance of local KeyValueStorage 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19783) StripedScheduledExecutorService for DistributionZoneManager#executor

2023-06-26 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19783:
---
Description: 
h3. *Motivation*

In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 for 
DistributionZoneManager#executor to ensure that all data nodes calculation 
tasks per a zone are executed in order of creation. But we need more threads to 
process this tasks. So we need to create StripedScheduledExecutorService and 
all tasks for the same zone must be executed in one stripe. The pool to execute 
the task is defined by a zone id.
h3. *Definition of Done*
 # StripedScheduledExecutorService is created and used instead of single thread 
executor in DistributionZoneManager.
 # All tasks for the same zone must be executed in one stripe.

  was:
h3. *Motivation*

In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 for 
DistributionZoneManager#executor to ensure that all data nodes calculation 
tasks per a zone are executed in order of creation. But we need more threads to 
process this tasks. So we need to create StripedScheduledExecutorService and 
all tasks for the same zone must be executed in one stripe.
h3. *Definition of Done*
 # StripedScheduledExecutorService is created and used instead of single thread 
executor in DistributionZoneManager.
 # All tasks for the same zone must be executed in one stripe.


> StripedScheduledExecutorService for DistributionZoneManager#executor
> 
>
> Key: IGNITE-19783
> URL: https://issues.apache.org/jira/browse/IGNITE-19783
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> In https://issues.apache.org/jira/browse/IGNITE-19736 we set corePoolSize=1 
> for DistributionZoneManager#executor to ensure that all data nodes 
> calculation tasks per a zone are executed in order of creation. But we need 
> more threads to process this tasks. So we need to create 
> StripedScheduledExecutorService and all tasks for the same zone must be 
> executed in one stripe. The pool to execute the task is defined by a zone id.
> h3. *Definition of Done*
>  # StripedScheduledExecutorService is created and used instead of single 
> thread executor in DistributionZoneManager.
>  # All tasks for the same zone must be executed in one stripe.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19782) Throw CompactedException if the revision in KeyValueStorage methods is lower than the compaction revision

2023-06-22 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19782:
---
Summary: Throw CompactedException if the revision in KeyValueStorage 
methods is lower than the compaction revision  (was: Create an ability to 
obtain the compaction revision)

> Throw CompactedException if the revision in KeyValueStorage methods is lower 
> than the compaction revision
> -
>
> Key: IGNITE-19782
> URL: https://issues.apache.org/jira/browse/IGNITE-19782
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> Implementations of some methods in RocksDbKeyValueStorage have a revision as 
> a parameter. For example, methods doGet, doGetAll and other. Need to check 
> that this revision is not compacted and throw exception if the revision is 
> lower. For this purpose need to know the last compacted revision.
> h3. *Definition of Done*
>  # Throw CompactedException if the revision in KeyValueStorage methods is 
> lower than the compaction revision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19782) Create an ability to obtain the compaction revision

2023-06-22 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19782:
---
Description: 
h3. *Motivation*

Implementations of some methods in RocksDbKeyValueStorage have a revision as a 
parameter. For example, methods doGet, doGetAll and other. Need to check that 
this revision is not compacted and throw exception if the revision is lower. 
For this purpose need to know the last compacted revision.
h3. *Definition of Done*
 # Throw CompactedException if the revision in KeyValueStorage methods is lower 
than the compaction revision.

  was:
h3. *Motivation*

Implementations of some methods in RocksDbKeyValueStorage have a revision as a 
parameter. For example, methods doGet, doGetAll and other. Need to check that 
this revision is not compacted and throw exception if the revision is lower. 
For this purpose need to know the last compacted revision.
h3. *Definition of Done*
 # Created a method for obtaining the compaction revision.
 # Added assert that the revision in KeyValueStorage methods is higher than the 
compaction revision.


> Create an ability to obtain the compaction revision
> ---
>
> Key: IGNITE-19782
> URL: https://issues.apache.org/jira/browse/IGNITE-19782
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> Implementations of some methods in RocksDbKeyValueStorage have a revision as 
> a parameter. For example, methods doGet, doGetAll and other. Need to check 
> that this revision is not compacted and throw exception if the revision is 
> lower. For this purpose need to know the last compacted revision.
> h3. *Definition of Done*
>  # Throw CompactedException if the revision in KeyValueStorage methods is 
> lower than the compaction revision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-19782) Create an ability to obtain the compaction revision

2023-06-22 Thread Sergey Uttsel (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-19782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Uttsel updated IGNITE-19782:
---
Issue Type: Improvement  (was: Bug)

> Create an ability to obtain the compaction revision
> ---
>
> Key: IGNITE-19782
> URL: https://issues.apache.org/jira/browse/IGNITE-19782
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Uttsel
>Priority: Major
>  Labels: ignite-3
>
> h3. *Motivation*
> Implementations of some methods in RocksDbKeyValueStorage have a revision as 
> a parameter. For example, methods doGet, doGetAll and other. Need to check 
> that this revision is not compacted and throw exception if the revision is 
> lower. For this purpose need to know the last compacted revision.
> h3. *Definition of Done*
>  # Created a method for obtaining the compaction revision.
>  # Added assert that the revision in KeyValueStorage methods is higher than 
> the compaction revision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   >