date:20240426

Vladislav Pyatkov created IGNITE-22129:
--

 Summary: Partition, CMG and metastorage should not share threads
 Key: IGNITE-22129
 URL: https://issues.apache.org/jira/browse/IGNITE-22129
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
These three subsystems have different purposes. Hence, using the same threads 
might lead to starvation. For the same reason, we already have a separate FMC 
caller disruptor for Metastorage, but other disruptors are still shared.
{code:java}
NodeImpl#ownFsmCallerExecutorDisruptorConfig
{code}

h3. Definition of done
At least, all partiton disruptor threads have to be different from the threads 
that are used by Metastorage and CMG.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22128) Balancing partitions across stripes



 [ 
https://issues.apache.org/jira/browse/IGNITE-22128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-22128:
---
Description: 
h3. Motivation
Right now, we use a hash to balance partitions.
{code:java}
public int getStripe(NodeId nodeId) {
  return Math.abs(nodeId.hashCode() % stripes);
}
{code}
This approach might lead to a skew.

h3. Definition of done
Partition is distributed statically by the honest round-robin algorithm.

  was:
h3. Motivation
Right now, we use a hash to balance partitions.
{code:java}
public int getStripe(NodeId nodeId) {
  return Math.abs(nodeId.hashCode() % stripes);
}
{code}
This approach might lead to a skew.

h3. Definition of done
Partition is distributed by the round-robin algorithm.


> Balancing partitions across stripes
> ---
>
> Key: IGNITE-22128
> URL: https://issues.apache.org/jira/browse/IGNITE-22128
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Right now, we use a hash to balance partitions.
> {code:java}
> public int getStripe(NodeId nodeId) {
>   return Math.abs(nodeId.hashCode() % stripes);
> }
> {code}
> This approach might lead to a skew.
> h3. Definition of done
> Partition is distributed statically by the honest round-robin algorithm.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22128) Balancing partitions across stripes

Vladislav Pyatkov created IGNITE-22128:
--

 Summary: Balancing partitions across stripes
 Key: IGNITE-22128
 URL: https://issues.apache.org/jira/browse/IGNITE-22128
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
Right now, we use a hash to balance partitions.
{code:java}
public int getStripe(NodeId nodeId) {
  return Math.abs(nodeId.hashCode() % stripes);
}
{code}
This approach might lead to a skew.

h3. Definition of done
Partition is distributed by the round-robin algorithm.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22127) Partition listener does not use batch update

Vladislav Pyatkov created IGNITE-22127:
--

 Summary: Partition listener does not use batch update
 Key: IGNITE-22127
 URL: https://issues.apache.org/jira/browse/IGNITE-22127
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


h3. Motivation
RAFT commands are batched in the FSM caller disruptor. The batch passes as 
collection to the partition replica listener, but the eatch command is handled 
as if it were single.

h3. Defenition of done
All command in the command iterator havd to be handeled as a batch storage 
update.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22124) Thin 3.0: Implement MapReduce API



 [ 
https://issues.apache.org/jira/browse/IGNITE-22124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Pakhnushev updated IGNITE-22124:
--
Description: Implement {{ClientTaskExecution}} in Java client.  (was: 
Implement {{ClientTaskExecution}}.)

> Thin 3.0: Implement MapReduce API
> -
>
> Key: IGNITE-22124
> URL: https://issues.apache.org/jira/browse/IGNITE-22124
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, thin client
>Reporter: Vadim Pakhnushev
>Priority: Major
>  Labels: ignite-3
>
> Implement {{ClientTaskExecution}} in Java client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22126) C++: Thin 3.0: Implement MapReduce API

Vadim Pakhnushev created IGNITE-22126:
-

 Summary: C++: Thin 3.0: Implement MapReduce API
 Key: IGNITE-22126
 URL: https://issues.apache.org/jira/browse/IGNITE-22126
 Project: Ignite
  Issue Type: Improvement
  Components: compute, platforms, thin client
Reporter: Vadim Pakhnushev


Implement {{ClientTaskExecution}} in .NET client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22126) C++: Thin 3.0: Implement MapReduce API



 [ 
https://issues.apache.org/jira/browse/IGNITE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Pakhnushev updated IGNITE-22126:
--
Description: Implement {{ClientTaskExecution}} in C++ client.  (was: 
Implement {{ClientTaskExecution}} in .NET client.)

> C++: Thin 3.0: Implement MapReduce API
> --
>
> Key: IGNITE-22126
> URL: https://issues.apache.org/jira/browse/IGNITE-22126
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, platforms, thin client
>Reporter: Vadim Pakhnushev
>Priority: Major
>  Labels: ignite-3
>
> Implement {{ClientTaskExecution}} in C++ client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22125) .NET: Thin 3.0: Implement MapReduce API



 [ 
https://issues.apache.org/jira/browse/IGNITE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Pakhnushev updated IGNITE-22125:
--
Description: Implement {{ClientTaskExecution}} in .NET client.  (was: 
Implement {{ClientTaskExecution}}.)

> .NET: Thin 3.0: Implement MapReduce API
> ---
>
> Key: IGNITE-22125
> URL: https://issues.apache.org/jira/browse/IGNITE-22125
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, thin client
>Reporter: Vadim Pakhnushev
>Priority: Major
>  Labels: ignite-3
>
> Implement {{ClientTaskExecution}} in .NET client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22125) .NET: Thin 3.0: Implement MapReduce API

Vadim Pakhnushev created IGNITE-22125:
-

 Summary: .NET: Thin 3.0: Implement MapReduce API
 Key: IGNITE-22125
 URL: https://issues.apache.org/jira/browse/IGNITE-22125
 Project: Ignite
  Issue Type: Improvement
  Components: compute, thin client
Reporter: Vadim Pakhnushev


Implement {{ClientTaskExecution}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22125) .NET: Thin 3.0: Implement MapReduce API



 [ 
https://issues.apache.org/jira/browse/IGNITE-22125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Pakhnushev updated IGNITE-22125:
--
Component/s: platforms

> .NET: Thin 3.0: Implement MapReduce API
> ---
>
> Key: IGNITE-22125
> URL: https://issues.apache.org/jira/browse/IGNITE-22125
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, platforms, thin client
>Reporter: Vadim Pakhnushev
>Priority: Major
>  Labels: ignite-3
>
> Implement {{ClientTaskExecution}} in .NET client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22124) Thin 3.0: Implement MapReduce API

Vadim Pakhnushev created IGNITE-22124:
-

 Summary: Thin 3.0: Implement MapReduce API
 Key: IGNITE-22124
 URL: https://issues.apache.org/jira/browse/IGNITE-22124
 Project: Ignite
  Issue Type: Improvement
  Components: compute, thin client
Reporter: Vadim Pakhnushev






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22124) Thin 3.0: Implement MapReduce API



 [ 
https://issues.apache.org/jira/browse/IGNITE-22124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Pakhnushev updated IGNITE-22124:
--
Description: Implement {{ClientTaskExecution}}.

> Thin 3.0: Implement MapReduce API
> -
>
> Key: IGNITE-22124
> URL: https://issues.apache.org/jira/browse/IGNITE-22124
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute, thin client
>Reporter: Vadim Pakhnushev
>Priority: Major
>  Labels: ignite-3
>
> Implement {{ClientTaskExecution}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-21908) Add metrics of distribution among stripes in disruptor



 [ 
https://issues.apache.org/jira/browse/IGNITE-21908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-21908:
---
Description: 
h3. Motivation
The metrics are useful to estimate the uniformity of the distribution.

h3. Implementation notes
These metrics can be implemented using the common approach, which is based on 
{{MetricSource}} interface.

h3. Definition of done
Metrics that become available:
* histogramm of batch size
* operations were processed


  was:
h3. Motivation
The metrics are useful to estimate the uniformity of the distribution.

h3. Implementation notes
These metrics can be implemented using the common approach, which is based on 
{{MetricSource}} interface.

h3. Definition of done
Metrics that become available:
* avarage bath size
* operations were processed



> Add metrics of distribution among stripes in disruptor
> --
>
> Key: IGNITE-21908
> URL: https://issues.apache.org/jira/browse/IGNITE-21908
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> The metrics are useful to estimate the uniformity of the distribution.
> h3. Implementation notes
> These metrics can be implemented using the common approach, which is based on 
> {{MetricSource}} interface.
> h3. Definition of done
> Metrics that become available:
> * histogramm of batch size
> * operations were processed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22123) Critical thread blocked on 2 node cluster with intensive create tables

2024-04-26 Thread Alexander Belyak (Jira)

Alexander Belyak created IGNITE-22123:
-

 Summary: Critical thread blocked on 2 node cluster with intensive 
create tables
 Key: IGNITE-22123
 URL: https://issues.apache.org/jira/browse/IGNITE-22123
 Project: Ignite
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Alexander Belyak


# Start 2 nodes cluster (I use single host for both nodes)
 # Connect to the first node and do in cycle:
 ## create table (different 8 columns PK)
 ## insert a few rows in it
 ## select a row from table

Expected result: test pass without errors

Actual result:

client get:
{noformat}
16:45:20.253 [junit-timeout-thread-119] INFO  o.g.a.t.teststeps.ThinClientSteps 
- Query: drop table if exists 
eight_different_types_TINYINT_INTEGER_FLOAT_TINYINT_TINYINT_TINYINT_TINYINT_TINYINT
Apr 26, 2024 4:45:20 PM org.apache.ignite.internal.logger.IgniteLogger 
logInternal
INFO: Partition assignment change notification received 
[remoteAddress=localhost:10800]
16:45:20.266 [junit-timeout-thread-119] INFO  o.g.a.t.teststeps.ThinClientSteps 
- Query: create table 
eight_different_types_TINYINT_INTEGER_FLOAT_TINYINT_TINYINT_TINYINT_TINYINT_TINYINT(keyTINYINT0
 TINYINT not null, keyINTEGER1 INTEGER not null, keyFLOAT2 FLOAT not null, 
keyTINYINT3 TINYINT not null, keyTINYINT4 TINYINT not null, keyTINYINT5 TINYINT 
not null, keyTINYINT6 TINYINT not null, keyTINYINT7 TINYINT not null, val 
INTEGER not null, primary key (keyTINYINT0, keyINTEGER1, keyFLOAT2, 
keyTINYINT3, keyTINYINT4, keyTINYINT5, keyTINYINT6, keyTINYINT7))
16:45:28.570 [junit-timeout-thread-119] INFO  o.g.a.t.teststeps.ThinClientSteps 
- Query: insert into 
eight_different_types_TINYINT_INTEGER_FLOAT_TINYINT_TINYINT_TINYINT_TINYINT_TINYINT(keyTINYINT0,
 keyINTEGER1, keyFLOAT2, keyTINYINT3, keyTINYINT4, keyTINYINT5, keyTINYINT6, 
keyTINYINT7, val) values (-96, 25781810, 5.0, -93, -92, -91, -90, -89, 
116513)org.apache.ignite.sql.SqlException: IGN-PLACEMENTDRIVER-1 
TraceId:e21f4eb1-4ecb-4ea2-aaf7-62a08b677f20 Failed to get the primary replica 
[tablePartitionId=85_part_18, awaitTimestamp=HybridTimestamp 
[physical=2024-04-26 16:45:28:599 +0300, logical=0, 
composite=112337821931864064]]
    at 
java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710)
    at 
org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:765)
    at 
org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:699)
    at 
org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
    at 
org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:634)
    at 
org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:476)
    at 
org.apache.ignite.internal.client.sql.ClientSql.execute(ClientSql.java:94)
    at 
org.gridgain.ai3tests.tests.teststeps.ThinClientSteps.lambda$executeQuery$0(ThinClientSteps.java:61)
    at io.qameta.allure.Allure.lambda$step$1(Allure.java:127)
    at io.qameta.allure.Allure.step(Allure.java:181)
    at io.qameta.allure.Allure.step(Allure.java:125)
    at 
org.gridgain.ai3tests.tests.teststeps.ThinClientSteps.executeQuery(ThinClientSteps.java:61)
    at 
org.gridgain.ai3tests.tests.PrimaryKeyConstraintsTest.testTypes(PrimaryKeyConstraintsTest.java:167)
    at 
org.gridgain.ai3tests.tests.PrimaryKeyConstraintsTest.test8Columns(PrimaryKeyConstraintsTest.java:120)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
io.qameta.allure.junit5.AllureJunit5.interceptTestTemplateMethod(AllureJunit5.java:59)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.CompletionException: 
org.apache.ignite.sql.SqlException: IGN-PLACEMENTDRIVER-1 
TraceId:e21f4eb1-4ecb-4ea2-aaf7-62a08b677f20 Failed to get the primary replica 
[tablePartitionId=85_part_18, awaitTimestamp=HybridTimestamp 
[physical=2024-04-26 16:45:28:599 +0300, logical=0, 
composite=112337821931864064]]
    at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
    at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
    at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
    at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2094)

[jira] [Updated] (IGNITE-22091) CLI for disaster recovery: partition states

2024-04-26 Thread Philipp Shergalis (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-22091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Shergalis updated IGNITE-22091:
---
Description: 
 
{code:java}
ignite cluster partition-states --cluster-url  [--local [--nodes 
] | --global] [--zones ] [--partitions ]
{code}
 

Output for local:
|Zone name|Node name|Table name|Partition ID|State|
|ZONE_NAME|node_name|TABLE_NAME|1|HEALTHY|

 

For global:
|Zone name|Table name|Partition ID|State|
|ZONE_NAME|TABLE_NAME|1|HEALTHY|

  was:
 
{code:java}
ignite cluster partition-states --cluster-url  [--local [--nodes 
] | --global] [--zones ] [--partitions ] 
[--plain] 
{code}
 

Output for local:
|Node name|Table name|Partition ID|State|
|node_name|TABLE_NAME|1|HEALTHY|

 

For global:
|Table name|Partition ID|State|
|TABLE_NAME|1|HEALTHY|


> CLI for disaster recovery: partition states
> ---
>
> Key: IGNITE-22091
> URL: https://issues.apache.org/jira/browse/IGNITE-22091
> Project: Ignite
>  Issue Type: Task
>  Components: cli
>Reporter: Philipp Shergalis
>Assignee: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> ignite cluster partition-states --cluster-url  [--local [--nodes 
> ] | --global] [--zones ] [--partitions ]
> {code}
>  
> Output for local:
> |Zone name|Node name|Table name|Partition ID|State|
> |ZONE_NAME|node_name|TABLE_NAME|1|HEALTHY|
>  
> For global:
> |Zone name|Table name|Partition ID|State|
> |ZONE_NAME|TABLE_NAME|1|HEALTHY|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-21538) Rework component lifecycle mode to asynchronous

2024-04-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/IGNITE-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841264#comment-17841264
 ] 

 Kirill Sizov edited comment on IGNITE-21538 at 4/26/24 2:50 PM:
-

Created a PoC in the attached github PR 
https://github.com/apache/ignite-3/pull/3557


was (Author: JIRAUSER301198):
Created a PoC in the attached github PR. 

> Rework component lifecycle mode to asynchronous
> ---
>
> Key: IGNITE-21538
> URL: https://issues.apache.org/jira/browse/IGNITE-21538
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 3.0.0-beta2
>Reporter: Alexey Scherbakov
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0
>
>
> Current design for comp lifecycle has issues:
> 1. It was designed for synchronous components, but almost all components are 
> asynchronous. 
> This causes to appear ugly things like _inBusyLockAsync_ and inefficient code 
> like
> _public @Nullable TxStateMeta stateMeta(UUID txId) \{ return 
> inBusyLock(busyLock, () -> txStateVolatileStorage.state(txId)); }_
> 2. Currently it's not possible to do truly graceful node shutdown, because IO 
> layer is disabled out-of-order, causing operation failures without a chance 
> to finish.
> I suggest reworking comp lifecycle to async model:
> 1. Each component tracks it's inflight async ops (as list of async chains)
> 2. On start components are initialized using _CompletableFuture 
> startAsync()_ method from root to leafs of dependency tree
> 3. On shutdown 
> 3.1 _CompletableFuturebeforeShutdown_ is called on comp from leafs to 
> root direction in dependency tree. This step waits for all active futures to 
> complete. Any new operation return a future completed with 
> _NodeStoppingException_
> 3.2 stop is called on comp from leafs to root direction in dependency tree. 
> This step destroys component resources, like pools, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-21538) Rework component lifecycle mode to asynchronous

2024-04-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/IGNITE-21538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841264#comment-17841264
 ] 

 Kirill Sizov commented on IGNITE-21538:


Created a PoC in the attached github PR. 

> Rework component lifecycle mode to asynchronous
> ---
>
> Key: IGNITE-21538
> URL: https://issues.apache.org/jira/browse/IGNITE-21538
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 3.0.0-beta2
>Reporter: Alexey Scherbakov
>Assignee:  Kirill Sizov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0
>
>
> Current design for comp lifecycle has issues:
> 1. It was designed for synchronous components, but almost all components are 
> asynchronous. 
> This causes to appear ugly things like _inBusyLockAsync_ and inefficient code 
> like
> _public @Nullable TxStateMeta stateMeta(UUID txId) \{ return 
> inBusyLock(busyLock, () -> txStateVolatileStorage.state(txId)); }_
> 2. Currently it's not possible to do truly graceful node shutdown, because IO 
> layer is disabled out-of-order, causing operation failures without a chance 
> to finish.
> I suggest reworking comp lifecycle to async model:
> 1. Each component tracks it's inflight async ops (as list of async chains)
> 2. On start components are initialized using _CompletableFuture 
> startAsync()_ method from root to leafs of dependency tree
> 3. On shutdown 
> 3.1 _CompletableFuturebeforeShutdown_ is called on comp from leafs to 
> root direction in dependency tree. This step waits for all active futures to 
> complete. Any new operation return a future completed with 
> _NodeStoppingException_
> 3.2 stop is called on comp from leafs to root direction in dependency tree. 
> This step destroys component resources, like pools, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-19082) Catalog. Cleanup dead code.

2024-04-26 Thread Iurii Gerzhedovich (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iurii Gerzhedovich reassigned IGNITE-19082:
---

Assignee: Maksim Zhuravkov

> Catalog. Cleanup dead code.
> ---
>
> Key: IGNITE-19082
> URL: https://issues.apache.org/jira/browse/IGNITE-19082
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Andrey Mashenkov
>Assignee: Maksim Zhuravkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Let's
>  * ensure Catalog is used by default as a single schema management point by 
> TableManager, IndexManager, SqlSchemaManager, SchemaRegistry.
>  * drop schema related code from configuration.
>  * drop outdated code from TableManager, IndexManager, SqlSchemaManager, 
> SchemaRegistry.
>  * make a PR for merging feature branch to main (if applicable).
>  * ensure there are end-to-end tests for the cases (if applicable) described 
> in CatalogServiceSelfTest. Also drop InternalSchemaTest.
>  * Let’s remove using/keeping schema name instaed of schema id (NewIndexEntry 
> class as example)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (IGNITE-18492) SQL: Inconsistent behavior of LENGTH limit for CHAR data type

2024-04-26 Thread Evgeny Stanilovsky (Jira)



[ https://issues.apache.org/jira/browse/IGNITE-18492 ]


Evgeny Stanilovsky deleted comment on IGNITE-18492:
-

was (Author: zstan):
[~jooger] [~amashenkov] [~mzhuravkov] guys can you make a review plz ?

> SQL: Inconsistent behavior of LENGTH limit for CHAR data type
> -
>
> Key: IGNITE-18492
> URL: https://issues.apache.org/jira/browse/IGNITE-18492
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 3.0.0-beta1
>Reporter: Andrey Khitrin
>Assignee: Evgeny Stanilovsky
>Priority: Major
>  Labels: ignite-3, sql
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When I create a table with {{CHAR(length)}} column, it's still possible to 
> insert character values with length greater than given limit.
> {code:sql}
> sql-cli> create table xx (pk int primary key, f1 char(5));
> Updated 0 rows.
> sql-cli> insert into xx values (1, 'abcdefgh');
> Updated 1 rows.
> sql-cli> select * from xx;
> ╔╤══╗
> ║ PK │ F1   ║
> ╠╪══╣
> ║ 1  │ abcdefgh ║
> ╚╧══╝
> {code}
> In other hand, length limit is applied when I insert non-char value that's 
> casted into {{CHAR}} implicitly. With the same table as above:
> {code:sql}
> sql-cli> insert into xx values (2, 1234567);
> Updated 1 rows.
> sql-cli> select * from xx;
> ╔╤══╗
> ║ PK │ F1   ║
> ╠╪══╣
> ║ 2  │ 12345║
> ╟┼──╢
> ║ 1  │ abcdefgh ║
> ╚╧══╝
> {code}
> Behavior should be consistent: ether strip both values down to the given 
> length limit, or deny to insert too long values in both cases (like it's done 
> in other DBs, like postgresql).
>  
> Dynamic params can be processed to, check 
> IgniteSqlValidator#inferDynamicParamType
> NOTE
> VARCHAR is also affected
> {color:#505f79}^(this note was added so that the ticket would be included in 
> the search for the keyword VARCHAR)^{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-19082) Catalog. Cleanup dead code.

2024-04-26 Thread Maksim Zhuravkov (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Zhuravkov reassigned IGNITE-19082:
-

Assignee: (was: Maksim Zhuravkov)

> Catalog. Cleanup dead code.
> ---
>
> Key: IGNITE-19082
> URL: https://issues.apache.org/jira/browse/IGNITE-19082
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Andrey Mashenkov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Let's
>  * ensure Catalog is used by default as a single schema management point by 
> TableManager, IndexManager, SqlSchemaManager, SchemaRegistry.
>  * drop schema related code from configuration.
>  * drop outdated code from TableManager, IndexManager, SqlSchemaManager, 
> SchemaRegistry.
>  * make a PR for merging feature branch to main (if applicable).
>  * ensure there are end-to-end tests for the cases (if applicable) described 
> in CatalogServiceSelfTest. Also drop InternalSchemaTest.
>  * Let’s remove using/keeping schema name instaed of schema id (NewIndexEntry 
> class as example)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22121) Change parameters to disaster recovery partition states api

2024-04-26 Thread Kirill Tkalenko (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-22121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-22121:
-
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Change parameters to disaster recovery partition states api
> ---
>
> Key: IGNITE-22121
> URL: https://issues.apache.org/jira/browse/IGNITE-22121
> Project: Ignite
>  Issue Type: Improvement
>  Components: rest
>Reporter: Philipp Shergalis
>Assignee: Philipp Shergalis
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For getLocalPartitionStates/getGlobalPartitionStates add partitionIds and 
> nodeNames as parameters, replace zoneName with zoneNames



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22121) Change parameters to disaster recovery partition states api

2024-04-26 Thread Kirill Tkalenko (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-22121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-22121:
-
Labels: ignite-3  (was: )

> Change parameters to disaster recovery partition states api
> ---
>
> Key: IGNITE-22121
> URL: https://issues.apache.org/jira/browse/IGNITE-22121
> Project: Ignite
>  Issue Type: Improvement
>  Components: rest
>Reporter: Philipp Shergalis
>Assignee: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For getLocalPartitionStates/getGlobalPartitionStates add partitionIds and 
> nodeNames as parameters, replace zoneName with zoneNames



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-22081) Delete TransactionSerializationException and TransactionAlreadyCompletedException

2024-04-26 Thread Ignite TC Bot (Jira)



[ 
https://issues.apache.org/jira/browse/IGNITE-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841152#comment-17841152
 ] 

Ignite TC Bot commented on IGNITE-22081:


{panel:title=Branch: [pull/11324/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/11324/head] Base: [master] : No new tests 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *-- Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7837072buildTypeId=IgniteTests24Java8_RunAll]

> Delete TransactionSerializationException and 
> TransactionAlreadyCompletedException
> -
>
> Key: IGNITE-22081
> URL: https://issues.apache.org/jira/browse/IGNITE-22081
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Julia Bakulina
>Assignee: Andrei Nadyktov
>Priority: Trivial
>  Labels: ise, newbie
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Delete TransactionSerializationException and 
> TransactionAlreadyCompletedException that are unused after MVCC code removal



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22122) Fix timeouts for waitForActualStates

2024-04-26 Thread Mirza Aliev (Jira)

Mirza Aliev created IGNITE-22122:


 Summary: Fix timeouts for waitForActualStates
 Key: IGNITE-22122
 URL: https://issues.apache.org/jira/browse/IGNITE-22122
 Project: Ignite
  Issue Type: Bug
Reporter: Mirza Aliev


There are several places in the code of {{ignite-collocation-feature}} branch, 
where timeouts are passed as 1 ms, but they must be justified somehow. This 
timeouts are used in {{Replica#waitForActualState}} as an expirationTime for 
retrying readIndex command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-21881) Deal with retry send metastorage raft commands after a timeout

2024-04-26 Thread Alexander Lapin (Jira)

[
https://issues.apache.org/jira/browse/IGNITE-21881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexander Lapin updated IGNITE-21881:
-
Description:
As a result of the analysis and reproduction of IGNITE-21142, it was found that
the metastorage raft command can be re-sent if it does not time out, which may
not be good and lead to hidden negative consequences, such as in IGNITE-21142.

Here we need to find out the reasons for this decision (with re-try by timeout)
and understand what to do next. I think we should use an infinite timeout.

As a result of the analysis and reproduction of IGNITE-21142, it was found that
the metastorage raft command can be re-sent if it does not time out, which may
not be good and lead to hidden negative consequences, such as in IGNITE-21142.

Here we need to find out the reasons for this decision (with re-try by timeout)
and understand what to do next. I think we should use an infinite timeout.
h3. Upd#1

As discussed, it's required to detect whether InvokeCommand was already
processed on a server and resend original response if true instead of
reprocessing. First of all it's not only about invoke but also about
multiInvoke. Worth mentioning though that it relates only to MS and maybe CMG
but not Partitions: within partitions, tx protocol along with returning result
from indexes instead of returning result from raft, protects us from
non-idempotent command processing.

All in all following solution is expected to be implemented:
* New interface NonIdempotentCommand is introduced with an id field.
* All MS non-idempotent commands like InvokeCommand and MultiInvokeCommand
implement aforementioned interface.
* On the client side, an identifier is added to the command. Two options are
possible here:
** It's possible to set id to the the command on command creation. Easiest
way, but it will required extra effort on the server side to track command
time. In that case it's possible to use LongCounter + nodeId as an id.
** Or it's possible to adjust command with an id within retry loop, in that
case we may use id as a "command time", of course, it also means that clock or
System.currentTime<> should be used as id. I strongly believe that first option
is better for now.
* On the server side, precisely, within MS state machine new
nonIdempotentCommandCache is introduced commandId -> (commandResult,
commandStartTime)
* On each NonIdempotentCommand following logic should be implemented:
** As an initial step it's required to check whether there's a command with
given id in the cache, if true just return cached result, without command
reprocessing.
** If there's no given command in the cache, process it and populate the cache
with the result.

Basically that's all. Both cache persistence and recovery on group restart and
cache cleanup will be covered within separate tickets.

was:
As a result of the analysis and reproduction of IGNITE-21142, it was found that
the metastorage raft command can be re-sent if it does not time out, which may
not be good and lead to hidden negative consequences, such as in IGNITE-21142.

Here we need to find out the reasons for this decision (with re-try by timeout)
and understand what to do next. I think we should use an infinite timeout.

Here we need to find out the reasons for this decision (with re-try by timeout)
and understand what to do next. I think we should use an infinite timeout.
h3. Upd#1

As discussed, it's required to detect whether InvokeCommand was already
processed on a server and resend original response if true instead of
reprocessing. First of all it's not only about invoke but about all
non-idempotent commands like getAndPut, getAndPutAll, getAndRemove, etc. Worth
mentioning though that it relates only to MS and maybe CMG but not Partitions:
within partitions, tx protocol along with returning result from indexes instead
of returning result from raft, protects us from non-idempotent command
processing.

[jira] [Updated] (IGNITE-22091) CLI for disaster recovery: partition states

2024-04-26 Thread Philipp Shergalis (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-22091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Shergalis updated IGNITE-22091:
---
Component/s: cli

> CLI for disaster recovery: partition states
> ---
>
> Key: IGNITE-22091
> URL: https://issues.apache.org/jira/browse/IGNITE-22091
> Project: Ignite
>  Issue Type: Task
>  Components: cli
>Reporter: Philipp Shergalis
>Assignee: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> ignite cluster partition-states --cluster-url  [--local [--nodes 
> ] | --global] [--zones ] [--partitions ] 
> [--plain] 
> {code}
>  
> Output for local:
> |Node name|Table name|Partition ID|State|
> |node_name|TABLE_NAME|1|HEALTHY|
>  
> For global:
> |Table name|Partition ID|State|
> |TABLE_NAME|1|HEALTHY|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22121) Change parameters to disaster recovery partition states api

2024-04-26 Thread Philipp Shergalis (Jira)

Philipp Shergalis created IGNITE-22121:
--

 Summary: Change parameters to disaster recovery partition states 
api
 Key: IGNITE-22121
 URL: https://issues.apache.org/jira/browse/IGNITE-22121
 Project: Ignite
  Issue Type: Improvement
  Components: rest
Reporter: Philipp Shergalis
Assignee: Philipp Shergalis


For getLocalPartitionStates/getGlobalPartitionStates add partitionIds and 
nodeNames as parameters, replace zoneName with zoneNames



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-21953) Cover SQL E021-01(Character string types. CHARACTER data type) feature by tests

2024-04-26 Thread Iurii Gerzhedovich (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-21953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iurii Gerzhedovich updated IGNITE-21953:

Description: 
We don't have at all any tests for E021-01(Character string types. CHARACTER 
data type) SQL feature.
Let's cover it and create tickets to fix them in case find any issues related 
to the covered area.

 

  was:
We don't have at all any tests for E021-01(Character string types. CHARACTER 
data type) SQL feature.
Let's cover it and create tickets to fix them in case find any issues related 
to the covered area.


> Cover SQL E021-01(Character string types. CHARACTER data type) feature by 
> tests
> ---
>
> Key: IGNITE-21953
> URL: https://issues.apache.org/jira/browse/IGNITE-21953
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Iurii Gerzhedovich
>Assignee: Iurii Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We don't have at all any tests for E021-01(Character string types. CHARACTER 
> data type) SQL feature.
> Let's cover it and create tickets to fix them in case find any issues related 
> to the covered area.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22120) Sql. Possibility set zero length for CHAR type

2024-04-26 Thread Iurii Gerzhedovich (Jira)

Iurii Gerzhedovich created IGNITE-22120:
---

 Summary: Sql. Possibility set zero length for CHAR type
 Key: IGNITE-22120
 URL: https://issues.apache.org/jira/browse/IGNITE-22120
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Iurii Gerzhedovich


Syntax Rules of paragraph 6.1 of Sql Standard says: The value of a  
shall be greater than 0 (zero).

However, we can set zero length for CHAR type

{code:java}
CREATE TABLE t_zero(c1 CHAR(0)); -- no error{code}

Require to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22119) Fix start of partial nodes from ItIgniteNodeRestartTest and ItIgniteDistributionZoneManagerNodeRestartTest

2024-04-26 Thread Mirza Aliev (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-22119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-22119:
-
Description: 
h3.  Motivation

After https://issues.apache.org/jira/browse/IGNITE-22071 is implemented, there 
is a couple of places where we explicitly start components for the test 
purposes, so node could start only needed components. In that code, we have 
some common pattern like 
{code:java}
for (IgniteComponent component : otherComponents) {
component.start();

components.add(component);
} {code}
In the review phase of https://issues.apache.org/jira/browse/IGNITE-22071 it 
was proposed to wait here for component to be started.


The problems is that in the common node start you have to call 
{{{}metaStorageMgr.notifyRevisionUpdateListenerOnStart(){}}}, so some 
components (like {{DistributionZoneManager }}could proceed with its logic, 
because it waits for some VersionedValues)

 
{code:java}
CompletableFuture startupConfigurationUpdate = 
notifyConfigurationListeners();
CompletableFuture startupRevisionUpdate = 
metaStorageMgr.notifyRevisionUpdateListenerOnStart();

return CompletableFuture.allOf(startupConfigurationUpdate, 
startupRevisionUpdate, startFuture) {code}
but in this partial node you wait for node to be started without calling 
{{{}notifyRevisionUpdateListenerOnStart{}}}, so it can't start. At lease this 
is the root cause for {{{}ItIgniteDistributionZoneManagerNodeRestartTest{}}}. I 
bet the problem is the same for {{ItIgniteNodeRestartTest}}

h3. Definition of done
Fix {{ItIgniteNodeRestartTest}} and 
{{ItIgniteDistributionZoneManagerNodeRestartTest}} partial node starts, so 
async start of components is actually waited.

  was:
h3.  Motivation

After https://issues.apache.org/jira/browse/IGNITE-22071 is implemented, there 
is a couple of places where we explicitly start components for the test 
purposes, so node could start only needed components. In that code, we have 
some common pattern like 
{code:java}
for (IgniteComponent component : otherComponents) {
component.start();

components.add(component);
} {code}
In the review phase of https://issues.apache.org/jira/browse/IGNITE-22071 it 
was proposed to wait here for component to be started.


The problems is that in the common node start you have to call 
{{{}metaStorageMgr.notifyRevisionUpdateListenerOnStart(){}}}, so some 
components (like {{DistributionZoneManager }}could proceed with its logic, 
because it waits for some VersionedValues)

 
{code:java}
CompletableFuture startupConfigurationUpdate = 
notifyConfigurationListeners();
CompletableFuture startupRevisionUpdate = 
metaStorageMgr.notifyRevisionUpdateListenerOnStart();

return CompletableFuture.allOf(startupConfigurationUpdate, 
startupRevisionUpdate, startFuture) {code}
but in this partial node you wait for node to be started without calling 
{{{}notifyRevisionUpdateListenerOnStart{}}}, so it can't start. At lease this 
is the root cause for {{{}ItIgniteDistributionZoneManagerNodeRestartTest{}}}. I 
bet the problem is the same for {{ItIgniteNodeRestartTest}}

h3. Definition of done
Fix ItIgniteNodeRestartTest and ItIgniteDistributionZoneManagerNodeRestartTest 
partial node starts, so async start of components is actually waited.


> Fix start of partial nodes from ItIgniteNodeRestartTest and 
> ItIgniteDistributionZoneManagerNodeRestartTest
> --
>
> Key: IGNITE-22119
> URL: https://issues.apache.org/jira/browse/IGNITE-22119
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> h3.  Motivation
> After https://issues.apache.org/jira/browse/IGNITE-22071 is implemented, 
> there is a couple of places where we explicitly start components for the test 
> purposes, so node could start only needed components. In that code, we have 
> some common pattern like 
> {code:java}
> for (IgniteComponent component : otherComponents) {
> component.start();
> components.add(component);
> } {code}
> In the review phase of https://issues.apache.org/jira/browse/IGNITE-22071 it 
> was proposed to wait here for component to be started.
> The problems is that in the common node start you have to call 
> {{{}metaStorageMgr.notifyRevisionUpdateListenerOnStart(){}}}, so some 
> components (like {{DistributionZoneManager }}could proceed with its logic, 
> because it waits for some VersionedValues)
>  
> {code:java}
> CompletableFuture startupConfigurationUpdate = 
> notifyConfigurationListeners();
> CompletableFuture startupRevisionUpdate = 
> metaStorageMgr.notifyRevisionUpdateListenerOnStart();
> return CompletableFuture.allOf(startupConfigurationUpdate, 
> startupRevisionUpdate,

[jira] [Created] (IGNITE-22119) Fix start of partial nodes from ItIgniteNodeRestartTest and ItIgniteDistributionZoneManagerNodeRestartTest

2024-04-26 Thread Mirza Aliev (Jira)

Mirza Aliev created IGNITE-22119:


 Summary: Fix start of partial nodes from ItIgniteNodeRestartTest 
and ItIgniteDistributionZoneManagerNodeRestartTest
 Key: IGNITE-22119
 URL: https://issues.apache.org/jira/browse/IGNITE-22119
 Project: Ignite
  Issue Type: Improvement
Reporter: Mirza Aliev


h3.  Motivation

After https://issues.apache.org/jira/browse/IGNITE-22071 is implemented, there 
is a couple of places where we explicitly start components for the test 
purposes, so node could start only needed components. In that code, we have 
some common pattern like 
{code:java}
for (IgniteComponent component : otherComponents) {
component.start();

components.add(component);
} {code}
In the review phase of https://issues.apache.org/jira/browse/IGNITE-22071 it 
was proposed to wait here for component to be started.


The problems is that in the common node start you have to call 
{{{}metaStorageMgr.notifyRevisionUpdateListenerOnStart(){}}}, so some 
components (like {{DistributionZoneManager }}could proceed with its logic, 
because it waits for some VersionedValues)

 
{code:java}
CompletableFuture startupConfigurationUpdate = 
notifyConfigurationListeners();
CompletableFuture startupRevisionUpdate = 
metaStorageMgr.notifyRevisionUpdateListenerOnStart();

return CompletableFuture.allOf(startupConfigurationUpdate, 
startupRevisionUpdate, startFuture) {code}
but in this partial node you wait for node to be started without calling 
{{{}notifyRevisionUpdateListenerOnStart{}}}, so it can't start. At lease this 
is the root cause for {{{}ItIgniteDistributionZoneManagerNodeRestartTest{}}}. I 
bet the problem is the same for {{ItIgniteNodeRestartTest}}

h3. Definition of done
Fix ItIgniteNodeRestartTest and ItIgniteDistributionZoneManagerNodeRestartTest 
partial node starts, so async start of components is actually waited.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22117) Node restart fails due to error: marshaller mappings storage is broken



 [ 
https://issues.apache.org/jira/browse/IGNITE-22117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor updated IGNITE-22117:
--
  Component/s: (was: persistence)
Affects Version/s: (was: 2.9.1)

> Node restart fails due to error: marshaller mappings storage is broken
> --
>
> Key: IGNITE-22117
> URL: https://issues.apache.org/jira/browse/IGNITE-22117
> Project: Ignite
>  Issue Type: Bug
>Reporter: Igor
>Priority: Major
>  Labels: ignite
>
> *Steps to reproduce:*
> 1. Start cluster of 3 nodes.
> 2. Create 4 tables with amount of rows up to 10.
> 3. Continously update data in the tables.
> 4. During the updates randomly restart the node.
> *Expected:*
> The node started successfully.
> *Actual:*
> The error happen during the node start:
> {code:java}
> [2024-04-24T21:59:10.738+0300][ERROR][main] Exception during start 
> processors, node will be stopped and close connections
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.MarshallerContextImpl.onMarshallerProcessorStarted(MarshallerContextImpl.java:536)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.processors.marshaller.GridMarshallerMappingProcessor.start(GridMarshallerMappingProcessor.java:114)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1938)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   ... 8 more
> [2024-04-24T21:59:10.751+0300][ERROR][main] Got exception while starting 
> (will rollback startup routine).
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
>

[jira] [Updated] (IGNITE-22117) Node restart fails due to error: marshaller mappings storage is broken



 [ 
https://issues.apache.org/jira/browse/IGNITE-22117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor updated IGNITE-22117:
--
Labels:   (was: ignite)

> Node restart fails due to error: marshaller mappings storage is broken
> --
>
> Key: IGNITE-22117
> URL: https://issues.apache.org/jira/browse/IGNITE-22117
> Project: Ignite
>  Issue Type: Bug
>Reporter: Igor
>Priority: Major
>
> *Steps to reproduce:*
> 1. Start cluster of 3 nodes.
> 2. Create 4 tables with amount of rows up to 10.
> 3. Continously update data in the tables.
> 4. During the updates randomly restart the node.
> *Expected:*
> The node started successfully.
> *Actual:*
> The error happen during the node start:
> {code:java}
> [2024-04-24T21:59:10.738+0300][ERROR][main] Exception during start 
> processors, node will be stopped and close connections
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.MarshallerContextImpl.onMarshallerProcessorStarted(MarshallerContextImpl.java:536)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.processors.marshaller.GridMarshallerMappingProcessor.start(GridMarshallerMappingProcessor.java:114)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1938)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   ... 8 more
> [2024-04-24T21:59:10.751+0300][ERROR][main] Got exception while starting 
> (will rollback startup routine).
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
>

[jira] [Resolved] (IGNITE-22117) Node restart fails due to error: marshaller mappings storage is broken



 [ 
https://issues.apache.org/jira/browse/IGNITE-22117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor resolved IGNITE-22117.
---
Resolution: Abandoned

> Node restart fails due to error: marshaller mappings storage is broken
> --
>
> Key: IGNITE-22117
> URL: https://issues.apache.org/jira/browse/IGNITE-22117
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.9.1
>Reporter: Igor
>Priority: Major
>  Labels: ignite
>
> *Steps to reproduce:*
> 1. Start cluster of 3 nodes.
> 2. Create 4 tables with amount of rows up to 10.
> 3. Continously update data in the tables.
> 4. During the updates randomly restart the node.
> *Expected:*
> The node started successfully.
> *Actual:*
> The error happen during the node start:
> {code:java}
> [2024-04-24T21:59:10.738+0300][ERROR][main] Exception during start 
> processors, node will be stopped and close connections
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.MarshallerContextImpl.onMarshallerProcessorStarted(MarshallerContextImpl.java:536)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.processors.marshaller.GridMarshallerMappingProcessor.start(GridMarshallerMappingProcessor.java:114)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1938)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   ... 8 more
> [2024-04-24T21:59:10.751+0300][ERROR][main] Got exception while starting 
> (will rollback startup routine).
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>

[jira] [Closed] (IGNITE-22117) Node restart fails due to error: marshaller mappings storage is broken



 [ 
https://issues.apache.org/jira/browse/IGNITE-22117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor closed IGNITE-22117.
-
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Node restart fails due to error: marshaller mappings storage is broken
> --
>
> Key: IGNITE-22117
> URL: https://issues.apache.org/jira/browse/IGNITE-22117
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.9.1
>Reporter: Igor
>Priority: Major
>  Labels: ignite
>
> *Steps to reproduce:*
> 1. Start cluster of 3 nodes.
> 2. Create 4 tables with amount of rows up to 10.
> 3. Continously update data in the tables.
> 4. During the updates randomly restart the node.
> *Expected:*
> The node started successfully.
> *Actual:*
> The error happen during the node start:
> {code:java}
> [2024-04-24T21:59:10.738+0300][ERROR][main] Exception during start 
> processors, node will be stopped and close connections
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.MarshallerContextImpl.onMarshallerProcessorStarted(MarshallerContextImpl.java:536)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.processors.marshaller.GridMarshallerMappingProcessor.start(GridMarshallerMappingProcessor.java:114)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1938)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   ... 8 more
> [2024-04-24T21:59:10.751+0300][ERROR][main] Got exception while starting 
> (will rollback startup routine).
> org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
>   at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1941)
>  ~[ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1165) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1709)
>  [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1146) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:637) 
> [ignite-core-8.9.3.jar:8.9.3]
>   at org.apache.ignite.IgniteSpring.start(IgniteSpring.java:65) 
> [ignite-spring-8.9.3.jar:8.9.3]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.start(IgniteStarter.java:140)
>  [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
>   at 
> org.gridgain.poc.framework.starter.IgniteStarter.main(IgniteStarter.java:73) 
> [poc-tester-ignite2-0.5.0-SNAPSHOT.jar:?]
> Caused by: org.apache.ignite.IgniteCheckedException: Class name is null for 
> [platformId=0, typeId=-852964974], marshaller mappings storage is broken. 
> Clean up marshaller directory (/marshaller) and restart the node. 
> File name: -852964974.classname0, FileSize: 0
>   at 
> org.apache.ignite.internal.MarshallerMappingFileStore.restoreMappings(MarshallerMappingFileStore.java:218)
>

[jira] [Created] (IGNITE-22118) Explicitly specify pool to execute start/stop on IgniteComponent

2024-04-26 Thread Jira

 Kirill Sizov created IGNITE-22118:
--

 Summary: Explicitly specify pool to execute start/stop on 
IgniteComponent
 Key: IGNITE-22118
 URL: https://issues.apache.org/jira/browse/IGNITE-22118
 Project: Ignite
  Issue Type: Task
Reporter:  Kirill Sizov


start/stop methods of {{IgniteComponent}} should be called from specific thread 
pools.

At the moment start is called from two different threads - some components 
start on _main_ while the others start on _startupExecutor_. 

We need a consistent process of starting and stoping all components.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-21881) Deal with retry send metastorage raft commands after a timeout

2024-04-26 Thread Vyacheslav Koptilin (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-21881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin reassigned IGNITE-21881:


Assignee: Denis Chudov

> Deal with retry send metastorage raft commands after a timeout
> --
>
> Key: IGNITE-21881
> URL: https://issues.apache.org/jira/browse/IGNITE-21881
> Project: Ignite
>  Issue Type: Bug
>Reporter: Kirill Tkalenko
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> As a result of the analysis and reproduction of IGNITE-21142, it was found 
> that the metastorage raft command can be re-sent if it does not time out, 
> which may not be good and lead to hidden negative consequences, such as in 
> IGNITE-21142.
> Here we need to find out the reasons for this decision (with re-try by 
> timeout) and understand what to do next. I think we should use an infinite 
> timeout.
> As a result of the analysis and reproduction of IGNITE-21142, it was found 
> that the metastorage raft command can be re-sent if it does not time out, 
> which may not be good and lead to hidden negative consequences, such as in 
> IGNITE-21142.
> Here we need to find out the reasons for this decision (with re-try by 
> timeout) and understand what to do next. I think we should use an infinite 
> timeout.
> h3. Upd#1
> As discussed, it's required to detect whether InvokeCommand was already 
> processed on a server and resend original response if true instead of 
> reprocessing. First of all it's not only about invoke but about all 
> non-idempotent commands like getAndPut, getAndPutAll, getAndRemove, etc. 
> Worth mentioning though that it relates only to MS and maybe CMG but not 
> Partitions: within partitions, tx protocol along with returning result from 
> indexes instead of returning result from raft, protects us from 
> non-idempotent command processing.
> All in all following solution is expected to be implemented:
>  * New interface NonIdempotentCommand is introduced with an id field.
>  * All MS non-idempotent commands like InvokeCommand, GetAndRemoveCommand, 
> etc implement aforementioned interface.
>  * On the client side, an identifier is added to the command. Two options are 
> possible here:
>  ** It's possible to set id to the the command on command creation. Easiest 
> way, but it will required extra effort on the server side to track command 
> time. In that case it's possible to use LongCounter + nodeId as an id. 
>  ** Or it's possible to adjust command with an id within retry loop, in that 
> case we may use id as a "command time", of course, it also means that clock 
> or System.currentTime<> should be used as id. I strongly believe that first 
> option is better for now. 
>  * On the server side, precisely, within MS state machine new 
> nonIdempotentCommandCache is introduced commandId -> (commandResult, 
> commandStartTime)
>  * On each NonIdempotentCommand following logic should be implemented:
>  ** As an initial step it's required to check whether there's a command with 
> given id in the cache, if true just return cached result, without command 
> reprocessing.
>  ** If there's no given command in the cache, process it and populate the 
> cache with the result.
> Basically that's all. Both cache persistence and recovery on group restart 
> and cache cleanup will be covered within separate tickets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22117) Node restart fails due to error: marshaller mappings storage is broken