[jira] [Assigned] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21661:
--

Assignee: Ivan Bessonov

> Test scenario where all stable nodes are lost during a partially completed 
> rebalance
> 
>
> Key: IGNITE-21661
> URL: https://issues.apache.org/jira/browse/IGNITE-21661
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Following case is possible:
>  * Nodes A, B and C for a partition
>  * B and C go offline
>  * new distribution is A, D and E
>  * full state transfer from A to D is completed
>  * full state transfer from A to E is not
>  * A goes offline
>  * we perform "resetPartitions"
> Ideally, we should use D as a new leader somehow, but the bare minimum should 
> be a partition that is functional, maybe an empty one. We should test the case
>  
> This might be a good place to add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21661:
---
Description: 
Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case

 

This might be a good place to add more tests.

  was:
Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case


> Test scenario where all stable nodes are lost during a partially completed 
> rebalance
> 
>
> Key: IGNITE-21661
> URL: https://issues.apache.org/jira/browse/IGNITE-21661
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Following case is possible:
>  * Nodes A, B and C for a partition
>  * B and C go offline
>  * new distribution is A, D and E
>  * full state transfer from A to D is completed
>  * full state transfer from A to E is not
>  * A goes offline
>  * we perform "resetPartitions"
> Ideally, we should use D as a new leader somehow, but the bare minimum should 
> be a partition that is functional, maybe an empty one. We should test the case
>  
> This might be a good place to add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22107) Properly encapsulate partition meta

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22107:
---
Description: 
{{PartitionMeta}} and {{PartitionMetaIo}} leak specific implementation details, 
specifically - all fields except for {{{}pageCount{}}}. This breaks 
encapsulation and makes {{page-memory}} module code non-reusable.

I propose splitting meta into 2 parts - abstract meta, that would only hold 
page count, and specific meta that will be located in a different module, close 
to the implementation.

In this case, we would have to pass meta IO as parameters into methods like 
{{{}PartitionMetaManager#readOrCreateMeta{}}}, and create a getter for IO in 
{{AbstractPartitionMeta}} class itself, but that's a necessary sacrifice. Some 
other places will be affected as well, mostly tests.

  was:
`PartitionMeta` and `PartitionMetaIo` leak specific implementation details, 
specifically - all fields except for `pageCount`. This breaks encapsulation and 
makes `page-memory` module code non-reusable.

I propose splitting meta into 2 parts - abstract meta, that would only hold 
page count, and specific meta that will be located in a different module, close 
to the implementation.

In this case, we would have to pass meta IO as parameters into methods like 
`PartitionMetaManager#readOrCreateMeta`, and create a getter for IO in 
`AbstractPartitionMeta` class itself, but that's a necessary sacrifice.


> Properly encapsulate partition meta
> ---
>
> Key: IGNITE-22107
> URL: https://issues.apache.org/jira/browse/IGNITE-22107
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> {{PartitionMeta}} and {{PartitionMetaIo}} leak specific implementation 
> details, specifically - all fields except for {{{}pageCount{}}}. This breaks 
> encapsulation and makes {{page-memory}} module code non-reusable.
> I propose splitting meta into 2 parts - abstract meta, that would only hold 
> page count, and specific meta that will be located in a different module, 
> close to the implementation.
> In this case, we would have to pass meta IO as parameters into methods like 
> {{{}PartitionMetaManager#readOrCreateMeta{}}}, and create a getter for IO in 
> {{AbstractPartitionMeta}} class itself, but that's a necessary sacrifice. 
> Some other places will be affected as well, mostly tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22107) Properly encapsulate partition meta

2024-04-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22107:
--

 Summary: Properly encapsulate partition meta
 Key: IGNITE-22107
 URL: https://issues.apache.org/jira/browse/IGNITE-22107
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


`PartitionMeta` and `PartitionMetaIo` leak specific implementation details, 
specifically - all fields except for `pageCount`. This breaks encapsulation and 
makes `page-memory` module code non-reusable.

I propose splitting meta into 2 parts - abstract meta, that would only hold 
page count, and specific meta that will be located in a different module, close 
to the implementation.

In this case, we would have to pass meta IO as parameters into methods like 
`PartitionMetaManager#readOrCreateMeta`, and create a getter for IO in 
`AbstractPartitionMeta` class itself, but that's a necessary sacrifice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21434) Fail user write requests for non-available partitions

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21434.

Resolution: Won't Fix

This insert doesn't hang indefinitely anymore, it fails with primary replica 
awaiting. I'm closing the issue as "Won't Fix"

> Fail user write requests for non-available partitions
> -
>
> Key: IGNITE-21434
> URL: https://issues.apache.org/jira/browse/IGNITE-21434
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Currently, {{INSERT INTO test VALUES(%d, %d);}} just hands indefinitely, 
> which is not what you would expect. We should either fail the request 
> immediately if there's no majority, or return a replication timeout 
> exception, for example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22075) GC doesn't wait for RO transactions

2024-04-19 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22075:
--

 Summary: GC doesn't wait for RO transactions
 Key: IGNITE-22075
 URL: https://issues.apache.org/jira/browse/IGNITE-22075
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


In https://issues.apache.org/jira/browse/IGNITE-21773 we started handling the 
LWM update concurrently by both TX manager and GC, which means that GC might 
start collecting garbage before transactions are finished. This doesn't even 
depend on listeners order, because both operations are asynchronous.

We must fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22041) Secondary indexes inline size calculation is wrong

2024-04-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22041:
---
Description: 
* "short" size is used as 16 bytes instead of 2 bytes
 * decimal header is not included in estimation

> Secondary indexes inline size calculation is wrong
> --
>
> Key: IGNITE-22041
> URL: https://issues.apache.org/jira/browse/IGNITE-22041
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * "short" size is used as 16 bytes instead of 2 bytes
>  * decimal header is not included in estimation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22063) aimem partition deletion doesn't delete GC queue

2024-04-17 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22063:
--

 Summary: aimem partition deletion doesn't delete GC queue
 Key: IGNITE-22063
 URL: https://issues.apache.org/jira/browse/IGNITE-22063
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{{org.apache.ignite.internal.storage.pagememory.mv.VolatilePageMemoryMvPartitionStorage#destroyStructures}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22050) Data structures don't clear partId of reused page

2024-04-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22050:
---
Description: 
In current implementation we use a single reuse list for all partitions in 
aimem storage engine.

That works fine in Ignite 2, but here in Ignite 3 we implemented a 
"partitilnless link" format for eliminating 2 bytes, that indicate partition 
number, from the data in pages. This means that if allocator provided the 
structure with the page from partition X, but the structure itself represents 
partition Y, we will lose the "X" in the process and next time will try 
accessing the page by the pageId that has Y encoded in it. This would lead to 
pageId mismatch.

We have several options here.
 * ignore mismatched partitions
 * get rid of partitionless pageIds
 * fix the allocator, so that it would change partition Id upon allocation

Ideally, we should go with the 3rd option. It requires some slight changes in 
internal data structure API, so that we would pass the required partitionId 
directly into the allocator (reuse list). This is a little bit excessive at 
first sight, but seems more appropriate in a long run. Ignite 2 pageIds are all 
messed up inside of structures, we can fix that.

> Data structures don't clear partId of reused page
> -
>
> Key: IGNITE-22050
> URL: https://issues.apache.org/jira/browse/IGNITE-22050
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In current implementation we use a single reuse list for all partitions in 
> aimem storage engine.
> That works fine in Ignite 2, but here in Ignite 3 we implemented a 
> "partitilnless link" format for eliminating 2 bytes, that indicate partition 
> number, from the data in pages. This means that if allocator provided the 
> structure with the page from partition X, but the structure itself represents 
> partition Y, we will lose the "X" in the process and next time will try 
> accessing the page by the pageId that has Y encoded in it. This would lead to 
> pageId mismatch.
> We have several options here.
>  * ignore mismatched partitions
>  * get rid of partitionless pageIds
>  * fix the allocator, so that it would change partition Id upon allocation
> Ideally, we should go with the 3rd option. It requires some slight changes in 
> internal data structure API, so that we would pass the required partitionId 
> directly into the allocator (reuse list). This is a little bit excessive at 
> first sight, but seems more appropriate in a long run. Ignite 2 pageIds are 
> all messed up inside of structures, we can fix that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-22055) Shut destruction executor down before closing volatile regions

2024-04-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-22055.

  Reviewer: Ivan Bessonov
Resolution: Fixed

> Shut destruction executor down before closing volatile regions
> --
>
> Key: IGNITE-22055
> URL: https://issues.apache.org/jira/browse/IGNITE-22055
> Project: Ignite
>  Issue Type: Bug
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22058) Use paranoid leak detection in tests

2024-04-17 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22058:
--

 Summary: Use paranoid leak detection in tests
 Key: IGNITE-22058
 URL: https://issues.apache.org/jira/browse/IGNITE-22058
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


We should set `io.netty.leakDetection.level=paranoid` in integration tests and 
network tests, in order to detect possible leaks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22050) Data structures don't clear partId of reused page

2024-04-16 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22050:
--

 Summary: Data structures don't clear partId of reused page
 Key: IGNITE-22050
 URL: https://issues.apache.org/jira/browse/IGNITE-22050
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22041) Secondary indexes inline size calculation is wrong

2024-04-15 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22041:
--

 Summary: Secondary indexes inline size calculation is wrong
 Key: IGNITE-22041
 URL: https://issues.apache.org/jira/browse/IGNITE-22041
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21999) Merge partition free-lists into one

2024-04-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21999:
--

Assignee: Philipp Shergalis  (was: Ivan Bessonov)

> Merge partition free-lists into one
> ---
>
> Key: IGNITE-21999
> URL: https://issues.apache.org/jira/browse/IGNITE-21999
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
>
> Current implementation has 2 free-lists:
>  * version chains
>  * index tuples
> These lists have separate buckets for different types of data pages. There's 
> an issue with this approach:
>  * overhead on pages - we have to allocate more pages to store buckets
>  * overhead on checkpoints - we have to save twice as many free-lists on 
> every checkpoint
> The reason, to my understanding, is the fact that FreeList class is 
> parameterized with the specific type of data that it stores. It makes no 
> sense to me, to be completely honest, because the algorithm is always the 
> same, and we always use the code from abstract free-list implementation.
> What I propose:
>  * get rid of abstract implementation and only have the concrete 
> implementation of free lists
>  * same for data pages
>  * serialization code will be fully moved to implementations of Storeable
> We're losing some guarantees if we do this change - we can no longer check 
> that type of the page is correct. My response to this issue is that every 
> Storeable could add a 1-byte header to the data, in order to validate it when 
> being read, that should be enough. If we could find a way to store less than 
> 1 byte then that's nice, I didn't look too much into the question.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21999) Merge partition free-lists into one

2024-04-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21999:
--

Assignee: Ivan Bessonov

> Merge partition free-lists into one
> ---
>
> Key: IGNITE-21999
> URL: https://issues.apache.org/jira/browse/IGNITE-21999
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Current implementation has 2 free-lists:
>  * version chains
>  * index tuples
> These lists have separate buckets for different types of data pages. There's 
> an issue with this approach:
>  * overhead on pages - we have to allocate more pages to store buckets
>  * overhead on checkpoints - we have to save twice as many free-lists on 
> every checkpoint
> The reason, to my understanding, is the fact that FreeList class is 
> parameterized with the specific type of data that it stores. It makes no 
> sense to me, to be completely honest, because the algorithm is always the 
> same, and we always use the code from abstract free-list implementation.
> What I propose:
>  * get rid of abstract implementation and only have the concrete 
> implementation of free lists
>  * same for data pages
>  * serialization code will be fully moved to implementations of Storeable
> We're losing some guarantees if we do this change - we can no longer check 
> that type of the page is correct. My response to this issue is that every 
> Storeable could add a 1-byte header to the data, in order to validate it when 
> being read, that should be enough. If we could find a way to store less than 
> 1 byte then that's nice, I didn't look too much into the question.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21999) Merge partition free-lists into one

2024-04-08 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21999:
--

 Summary: Merge partition free-lists into one
 Key: IGNITE-21999
 URL: https://issues.apache.org/jira/browse/IGNITE-21999
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Current implementation has 2 free-lists:
 * version chains
 * index tuples

These lists have separate buckets for different types of data pages. There's an 
issue with this approach:
 * overhead on pages - we have to allocate more pages to store buckets
 * overhead on checkpoints - we have to save twice as many free-lists on every 
checkpoint

The reason, to my understanding, is the fact that FreeList class is 
parameterized with the specific type of data that it stores. It makes no sense 
to me, to be completely honest, because the algorithm is always the same, and 
we always use the code from abstract free-list implementation.

What I propose:
 * get rid of abstract implementation and only have the concrete implementation 
of free lists
 * same for data pages
 * serialization code will be fully moved to implementations of Storeable

We're losing some guarantees if we do this change - we can no longer check that 
type of the page is correct. My response to this issue is that every Storeable 
could add a 1-byte header to the data, in order to validate it when being read, 
that should be enough. If we could find a way to store less than 1 byte then 
that's nice, I didn't look too much into the question.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21257) Public Java API to get global partition states

2024-04-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21257:
--

Assignee: Ivan Bessonov

> Public Java API to get global partition states
> --
>
> Key: IGNITE-21257
> URL: https://issues.apache.org/jira/browse/IGNITE-21257
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
> list.
> We should use local partition states, implemented in IGNITE-21256, and 
> combine them in cluster-wide compute call, before returning to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21987:
---
Description: 
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

IMPORTANT: we should throw an exception if somebody scans an index and 
IndexStorage#getNextRowIdToBuild is not null. It should be a new error, like 
"IndexNotBuiltException"

  was:
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

IMPORTANT: we should throw an exception if somebody scans an index and 
IndexStorage#getNextRowIdToBuild is not null.


> Optimize RO scan in sorted indexes
> --
>
> Key: IGNITE-21987
> URL: https://issues.apache.org/jira/browse/IGNITE-21987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> This issue applies to aimem/aipersist primarily. Optimization for rocksdb 
> might be done separately.
>  * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
> simple cursor
>  * in the implementation we should use alternative cursor implementation for 
> RO scans - it should delegate calls to B+Tree cursor
>  * reuse existing tests where possible
>  * call new method where necessary (PartitionReplicaListener#scanSortedIndex)
> IMPORTANT: we should throw an exception if somebody scans an index and 
> IndexStorage#getNextRowIdToBuild is not null. It should be a new error, like 
> "IndexNotBuiltException"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21987:
---
Description: 
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

IMPORTANT: we should throw an exception if somebody scans an index and 
IndexStorage#getNextRowIdToBuild is not null.

  was:
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)


> Optimize RO scan in sorted indexes
> --
>
> Key: IGNITE-21987
> URL: https://issues.apache.org/jira/browse/IGNITE-21987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> This issue applies to aimem/aipersist primarily. Optimization for rocksdb 
> might be done separately.
>  * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
> simple cursor
>  * in the implementation we should use alternative cursor implementation for 
> RO scans - it should delegate calls to B+Tree cursor
>  * reuse existing tests where possible
>  * call new method where necessary (PartitionReplicaListener#scanSortedIndex)
> IMPORTANT: we should throw an exception if somebody scans an index and 
> IndexStorage#getNextRowIdToBuild is not null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21987:
---
Description: 
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

  was:
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new flag RO_SCAN to SortedIndexStorage
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor, and "peek" should throw an 
"UnsupportedOperationException"
 * for "rocksdb" it shouldn't refresh the iterator all the time. "peek" should 
also throw exceptions
 * reuse existing tests
 * pass new RO_SCAN flag into a method where it's necessary


> Optimize RO scan in sorted indexes
> --
>
> Key: IGNITE-21987
> URL: https://issues.apache.org/jira/browse/IGNITE-21987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> This issue applies to aimem/aipersist primarily. Optimization for rocksdb 
> might be done separately.
>  * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
> simple cursor
>  * in the implementation we should use alternative cursor implementation for 
> RO scans - it should delegate calls to B+Tree cursor
>  * reuse existing tests where possible
>  * call new method where necessary (PartitionReplicaListener#scanSortedIndex)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21987:
--

 Summary: Optimize RO scan in sorted indexes
 Key: IGNITE-21987
 URL: https://issues.apache.org/jira/browse/IGNITE-21987
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new flag RO_SCAN to SortedIndexStorage
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor, and "peek" should throw an 
"UnsupportedOperationException"
 * for "rocksdb" it shouldn't refresh the iterator all the time. "peek" should 
also throw exceptions
 * reuse existing tests
 * pass new RO_SCAN flag into a method where it's necessary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21906) Consider disabling inline in PK index by default

2024-04-02 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21906:
--

 Summary: Consider disabling inline in PK index by default
 Key: IGNITE-21906
 URL: https://issues.apache.org/jira/browse/IGNITE-21906
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


In aipersist/aimem we attempt to inline binary tuples into pages for hash 
indexes by default. This, in theory, saves us from the necessity of accessing 
binary tuples from data pages for comparison, which is slower than comparing 
inlined data.

But, assuming the good hash distribution, we would only have to do the real 
comparison for the matched tuple. At the same time, inlined data might be 
substantially larger than hash+link, meaning that B+Tree with inlined data has 
bigger height, which correlates with slower search speed.

So, we have both pros and cons for inlining, and the only real way to reconcile 
them is to compare them with some benchmarks. This is exactly what I propose.

TL;DR: force inline size to be 0 for hash indices and benchmark for put/get 
operations, with large enough amount of data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21902) Add an option to configure log storage path

2024-04-02 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21902:
--

 Summary: Add an option to configure log storage path
 Key: IGNITE-21902
 URL: https://issues.apache.org/jira/browse/IGNITE-21902
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


Option to store log and data on separate devices can substantially improve the 
performance in a long run for many users, we should implement it.

There is such an option in Ignite 2, and people use it all the time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21898) Remove reactive methods from AntiHijackingIgniteSql

2024-04-01 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21898.

  Reviewer: Ivan Bessonov
Resolution: Fixed

> Remove reactive methods from AntiHijackingIgniteSql
> ---
>
> Key: IGNITE-21898
> URL: https://issues.apache.org/jira/browse/IGNITE-21898
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> They were removed from IgniteSql interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-03-04 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21661:
--

 Summary: Test scenario where all stable nodes are lost during a 
partially completed rebalance
 Key: IGNITE-21661
 URL: https://issues.apache.org/jira/browse/IGNITE-21661
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21284) Internal API for manual raft group configuration update

2024-02-23 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21284:
---
Description: 
We need an API (with implementation) that's analogous to 
"reset-lost-partitions", but with the ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.
h2. What's implemented

"resetPartitions" operation in distributed zone manager. It identifies 
partitions where only a minority of nodes is online (thus they won't be able to 
execute "changePeersAsync"), and writes a "forced pending assignments" for them.

Forced assignment excludes stable nodes, that are not present in pending 
assignment, from a new raft group configuration. It also performs a 
"resetPeers" operation on alive nodes from the stable assignment.

Complete loss of all nodes from stable assignments is not yet implemented, at 
least one node is required to be elected as a leader.

  was:
We need an API (with implementation) that's analogous to 
"reset-lost-partitions", but with the ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.


> Internal API for manual raft group configuration update
> ---
>
> Key: IGNITE-21284
> URL: https://issues.apache.org/jira/browse/IGNITE-21284
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need an API (with implementation) that's analogous to 
> "reset-lost-partitions", but with the ability to reuse living minority of 
> nodes.
> This API should gather the states of partitions, identify healthy peers, and 
> use them as a new raft group configuration (through the update of 
> assignments).
> We have to make sure that node with latest log index will become a leader, so 
> we will have to propagate desired minimum for log index in assignments and 
> use it during the voting.
> h2. What's implemented
> "resetPartitions" operation in distributed zone manager. It identifies 
> partitions where only a minority of nodes is online (thus they won't be able 
> to execute "changePeersAsync"), and writes a "forced pending assignments" for 
> them.
> Forced assignment excludes stable nodes, that are not present in pending 
> assignment, from a new raft group configuration. It also performs a 
> "resetPeers" operation on alive nodes from the stable assignment.
> Complete loss of all nodes from stable assignments is not yet implemented, at 
> least one node is required to be elected as a leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21588) CMG commands idempotency is broken

2024-02-22 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21588:
---
Description: 
When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
do the following:
 * Read local state with {{{}readLogicalTopology(){}}}.
 * Modify state according to the command.
 * {*}Increase version{*}.
 * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.

The problem lies in reading and writing of the state - it' local, and version 
value is not replicated.

What happens when we restart the node:
 * It starts without local storage snapshot, with appliedIndex == 0, which is a 
{*}state in the past{*}.
 * We apply commands that were already applied before restart.
 * We apply these commands to locally saved topology snapshot.
 * This logical topology snapshot has a *state in the future* when compared to 
appliedIndex == 0.
 * As a result, when we re-apply some commands, we *increase the version* one 
more time, thus breaking data consistency between nodes.

This would have been fine if we only used this version locally. But 
distribution zones rely on the consistency of the version between all nodes in 
cluster. This might break DZ data nodes handling if any of the cluster nodes 
restarts.

How to fix:
 * Either drop the storage if there's no storage snapshot, this will restore 
consistency
 * or never start CMG group from a snapshot, but rather start it from the 
latest storage data.

  was:
When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
do the following:
 * Read local state with {{{}readLogicalTopology(){}}}.
 * Modify state according to the command.
 * {*}Increase version{*}.
 * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.

The problem lies in reading and writing of the state - it's local, and version 
value is not replicated.

What happens when we restart the node:
 * It starts with local storage snapshot, which is a {*}state in the past{*}, 
generally speaking.
 * We apply commands that were not applied in the snapshot.
 * We apply these commands to locally saved topology snapshot.
 * This logical topology snapshot has a *state in the future* when compared to 
storage snapshot.
 * As a result, when we re-apply some commands, we *increase the version* one 
more time, thus breaking data consistency between nodes.

This would have been fine if we only used this version locally. But 
distribution zones rely on the consistency of the version between all nodes in 
cluster. This might break DZ data nodes handling if any of the cluster nodes 
restarts.


> CMG commands idempotency is broken
> --
>
> Key: IGNITE-21588
> URL: https://issues.apache.org/jira/browse/IGNITE-21588
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
> do the following:
>  * Read local state with {{{}readLogicalTopology(){}}}.
>  * Modify state according to the command.
>  * {*}Increase version{*}.
>  * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.
> The problem lies in reading and writing of the state - it' local, and version 
> value is not replicated.
> What happens when we restart the node:
>  * It starts without local storage snapshot, with appliedIndex == 0, which is 
> a {*}state in the past{*}.
>  * We apply commands that were already applied before restart.
>  * We apply these commands to locally saved topology snapshot.
>  * This logical topology snapshot has a *state in the future* when compared 
> to appliedIndex == 0.
>  * As a result, when we re-apply some commands, we *increase the version* one 
> more time, thus breaking data consistency between nodes.
> This would have been fine if we only used this version locally. But 
> distribution zones rely on the consistency of the version between all nodes 
> in cluster. This might break DZ data nodes handling if any of the cluster 
> nodes restarts.
> How to fix:
>  * Either drop the storage if there's no storage snapshot, this will restore 
> consistency
>  * or never start CMG group from a snapshot, but rather start it from the 
> latest storage data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21588) CMG commands idempotency is broken

2024-02-22 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21588:
--

 Summary: CMG commands idempotency is broken
 Key: IGNITE-21588
 URL: https://issues.apache.org/jira/browse/IGNITE-21588
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
do the following:
 * Read local state with {{{}readLogicalTopology(){}}}.
 * Modify state according to the command.
 * {*}Increase version{*}.
 * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.

The problem lies in reading and writing of the state - it' local, and version 
value is not replicated.

What happens when we restart the node:
 * It starts with local storage snapshot, which is a {*}state in the past{*}, 
generally speaking.
 * We apply commands that were not applied in the snapshot.
 * We apply these commands to locally saved topology snapshot.
 * This logical topology snapshot has a *state in the future* when compared to 
storage snapshot.
 * As a result, when we re-apply some commands, we *increase the version* one 
more time, thus breaking data consistency between nodes.

This would have been fine if we only used this version locally. But 
distribution zones rely on the consistency of the version between all nodes in 
cluster. This might break DZ data nodes handling if any of the cluster nodes 
restarts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21548) Encapsulate Set

2024-02-16 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21548:
--

 Summary: Encapsulate Set
 Key: IGNITE-21548
 URL: https://issues.apache.org/jira/browse/IGNITE-21548
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


Assignments may have some associated metadata, like a "force" flag, for 
example. We should prepare the code for introducing such meta in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-18366) Simplify the configuration asm generator, phase 2

2024-02-16 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-18366:
---
Description: 
After the split, it makes sense to start simplifying every individual 
generator. This is partially a research issue. Exactly what to do is not clear 
yet.

Some context: classes in package 
{{org.apache.ignite.internal.configuration.asm}} are pretty big and 
complicated.  {{InnerNodeAsmGenerator}} is almost 2000 lines long.

How can we make it simpler? Better naming, more comments. Inner node generation 
can be split into multiple files, because it also handles polymorphic 
implementations.

In some cases I would change the generation itself. For example, generated 
methods in polymorphic instances have the same implementation as in original 
inner node instead of simply delegating the execution to inner nodes. It affect 
both performance and the code of the generators in negative way.

  was:After the split, it makes sense to start simplifying every individual 
generator. This is partially a research issue. Exactly what to do is not clear 
yet.


> Simplify the configuration asm generator, phase 2
> -
>
> Key: IGNITE-18366
> URL: https://issues.apache.org/jira/browse/IGNITE-18366
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: iep-55, ignite-3, technical-debt
> Fix For: 3.0.0-beta2
>
>
> After the split, it makes sense to start simplifying every individual 
> generator. This is partially a research issue. Exactly what to do is not 
> clear yet.
> Some context: classes in package 
> {{org.apache.ignite.internal.configuration.asm}} are pretty big and 
> complicated.  {{InnerNodeAsmGenerator}} is almost 2000 lines long.
> How can we make it simpler? Better naming, more comments. Inner node 
> generation can be split into multiple files, because it also handles 
> polymorphic implementations.
> In some cases I would change the generation itself. For example, generated 
> methods in polymorphic instances have the same implementation as in original 
> inner node instead of simply delegating the execution to inner nodes. It 
> affect both performance and the code of the generators in negative way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21302) Prohibit automatic group reconfiguration when there's no majority

2024-02-14 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21302.

Resolution: Won't Fix

This fix is not required. Data loss won't happen for different reasons

> Prohibit automatic group reconfiguration when there's no majority
> -
>
> Key: IGNITE-21302
> URL: https://issues.apache.org/jira/browse/IGNITE-21302
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> scaleDown timer should not lead to a situation where user loses the data.
> Default "changePeers" behavior also won't work, because there's no majority 
> and thus no leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21501) Create index storages for new partitions on rebalance

2024-02-09 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21501:
---
Epic Link: IGNITE-20782

> Create index storages for new partitions on rebalance
> -
>
> Key: IGNITE-21501
> URL: https://issues.apache.org/jira/browse/IGNITE-21501
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> It appears that we only create index storages during the "table creation", 
> not during the "partition creation" if it's performed in isolation.
> Even if we did, 
> {{org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler#waitIndexes}}
>  is still badly designed, because it waits for indexes of the initial 
> partitions distribution and cannot provide any guarantees when assignments 
> are changed.
> This leads to NPEs or bizarre assertions, related to aforementioned method.
> What we need to do is:
>  * Get rid of the faulty index awaiting mechanizm.
>  * Create index storages before starting raft group.
>  * [optional] There might be naturally occurring "races" between catalog 
> updates (index creation) and rebalance. Right now they are resolved by the 
> fact that these processes are linearized in watch processing, but that's not 
> the best approach. If we could provide something more robust, that would have 
> been nice. Let's think about it at least.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21501) Create index storages for new partitions on rebalance

2024-02-09 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21501:
--

 Summary: Create index storages for new partitions on rebalance
 Key: IGNITE-21501
 URL: https://issues.apache.org/jira/browse/IGNITE-21501
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


It appears that we only create index storages during the "table creation", not 
during the "partition creation" if it's performed in isolation.

Even if we did, 
{{org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler#waitIndexes}}
 is still badly designed, because it waits for indexes of the initial 
partitions distribution and cannot provide any guarantees when assignments are 
changed.

This leads to NPEs or bizarre assertions, related to aforementioned method.

What we need to do is:
 * Get rid of the faulty index awaiting mechanizm.
 * Create index storages before starting raft group.
 * [optional] There might be naturally occurring "races" between catalog 
updates (index creation) and rebalance. Right now they are resolved by the fact 
that these processes are linearized in watch processing, but that's not the 
best approach. If we could provide something more robust, that would have been 
nice. Let's think about it at least.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21488) Disable thread assertions by default

2024-02-07 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21488.

  Reviewer: Ivan Bessonov
Resolution: Fixed

> Disable thread assertions by default
> 
>
> Key: IGNITE-21488
> URL: https://issues.apache.org/jira/browse/IGNITE-21488
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21469:
---
Epic Link: IGNITE-21444

> AssertionError in checkpoint
> 
>
> Key: IGNITE-21469
> URL: https://issues.apache.org/jira/browse/IGNITE-21469
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
>  
> {code:java}
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
> [pageId=000100020378, effectivePageId=00020378, groupId=886]  
>  at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more{code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]
>  
> The reason of the assertion is a bug/race in listeners unregistration for 
> partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21469:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> AssertionError in checkpoint
> 
>
> Key: IGNITE-21469
> URL: https://issues.apache.org/jira/browse/IGNITE-21469
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
>  
> {code:java}
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
> [pageId=000100020378, effectivePageId=00020378, groupId=886]  
>  at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more{code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]
>  
> The reason of the assertion is a bug/race in listeners unregistration for 
> partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21469:
---
Labels: ignite-3  (was: )

> AssertionError in checkpoint
> 
>
> Key: IGNITE-21469
> URL: https://issues.apache.org/jira/browse/IGNITE-21469
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
>  
> {code:java}
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
> [pageId=000100020378, effectivePageId=00020378, groupId=886]  
>  at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more{code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]
>  
> The reason of the assertion is a bug/race in listeners unregistration for 
> partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21469:
--

 Summary: AssertionError in checkpoint
 Key: IGNITE-21469
 URL: https://issues.apache.org/jira/browse/IGNITE-21469
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


 
{code:java}
  at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
 ~[?:?]   at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
 ~[?:?]   at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) 
~[?:?]   at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
 ~[?:?]   at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
[pageId=000100020378, effectivePageId=00020378, groupId=886]   
at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]   ... 1 more{code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]

 

The reason of the assertion is a bug/race in listeners unregistration for 
partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21044) Investigate long table creation

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21044.

Resolution: Done

> Investigate long table creation
> ---
>
> Key: IGNITE-21044
> URL: https://issues.apache.org/jira/browse/IGNITE-21044
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> If we run the test, in which we would create a lot of tables (mare than 200? 
> for example), we soon start seeing a degradation in table creation time.
> In particular, handling of corresponding Catalog update might take literal 
> seconds.
> One of the reasons is described here: 
> https://issues.apache.org/jira/browse/IGNITE-19913
> It explains why table creation might be slow, but it does not explain why it 
> degrades when we create more tables. So there are basically two issues:
>  * watch processing waits for unnecessary operations to complete
>  * those operations are too slow for some reason
> We need to investigate and fix both issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21466) Add metrics for partition states

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21466:
--

Assignee: Ivan Bessonov

> Add metrics for partition states
> 
>
> Key: IGNITE-21466
> URL: https://issues.apache.org/jira/browse/IGNITE-21466
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21466) Add metrics for partition states

2024-02-06 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21466:
--

 Summary: Add metrics for partition states
 Key: IGNITE-21466
 URL: https://issues.apache.org/jira/browse/IGNITE-21466
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21465) Add system views for partition states

2024-02-06 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21465:
--

 Summary: Add system views for partition states
 Key: IGNITE-21465
 URL: https://issues.apache.org/jira/browse/IGNITE-21465
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations

2024-02-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21446:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Import JVM args from build.gradle for JUnit run configurations
> --
>
> Key: IGNITE-21446
> URL: https://issues.apache.org/jira/browse/IGNITE-21446
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This should help running tests locally with IDEA runner on Java 17



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations

2024-02-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21446:
---
Reviewer: Kirill Tkalenko

> Import JVM args from build.gradle for JUnit run configurations
> --
>
> Key: IGNITE-21446
> URL: https://issues.apache.org/jira/browse/IGNITE-21446
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This should help running tests locally with IDEA runner on Java 17



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations

2024-02-05 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21446:
--

 Summary: Import JVM args from build.gradle for JUnit run 
configurations
 Key: IGNITE-21446
 URL: https://issues.apache.org/jira/browse/IGNITE-21446
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2


This should help running tests locally with IDEA runner on Java 17



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21434) Fail user write requests for non-available partitions

2024-02-02 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21434:
--

 Summary: Fail user write requests for non-available partitions
 Key: IGNITE-21434
 URL: https://issues.apache.org/jira/browse/IGNITE-21434
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Currently, {{INSERT INTO test VALUES(%d, %d);}} just hands indefinitely, which 
is not what you would expect. We should either fail the request immediately if 
there's no majority, or return a replication timeout exception, for example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-20067) Optimize "StorageUpdateHandler#handleUpdateAll"

2024-01-30 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-20067.

Fix Version/s: 3.0.0-beta2
 Reviewer: Ivan Bessonov
   Resolution: Fixed

> Optimize "StorageUpdateHandler#handleUpdateAll"
> ---
>
> Key: IGNITE-20067
> URL: https://issues.apache.org/jira/browse/IGNITE-20067
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In current implementation, the size of a single batch inside of the 
> "runConsistently" is unpredictable, because the collection of rows is 
> received from the message.
> Generally speaking, it's a good idea to make the scope of single 
> "runConsistently" smaller - it would lead to faster work in all storage 
> engines:
>  * for rocksdb, write batches would become smaller;
>  * for page memory, spikes on checkpoint would become smaller.
> There are two criteria that we could use:
>  * number of rows stored;
>  * cumulative number of inserted bytes.
> Raft does the same approximation when batching log records, for example. This 
> should not affect the data consistency, because updateAll itself is 
> idempotent by its nature



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21359) There are 2 RebalanceUtil classes

2024-01-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21359:
--

 Summary: There are 2 RebalanceUtil classes
 Key: IGNITE-21359
 URL: https://issues.apache.org/jira/browse/IGNITE-21359
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2


and they duplicate constants and methods. The least that we could do is remove 
code duplication and maybe rename one of these classes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21347) Fix license header extra whitespaces in ErrorCodeGroup annotation processor

2024-01-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21347:
---
Labels: ignite-3  (was: )

> Fix license header extra whitespaces in ErrorCodeGroup annotation processor 
> 
>
> Key: IGNITE-21347
> URL: https://issues.apache.org/jira/browse/IGNITE-21347
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitrii Zabotlin
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are extra whitespaces in the license headers in the generated error 
> codes files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21284) Internal API for manual raft group configuration update

2024-01-22 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21284:
--

Assignee: Ivan Bessonov

> Internal API for manual raft group configuration update
> ---
>
> Key: IGNITE-21284
> URL: https://issues.apache.org/jira/browse/IGNITE-21284
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> We need an API (with implementation) that's analogous to 
> "reset-lost-partitions", but with the ability to reuse living minority of 
> nodes.
> This API should gather the states of partitions, identify healthy peers, and 
> use them as a new raft group configuration (through the update of 
> assignments).
> We have to make sure that node with latest log index will become a leader, so 
> we will have to propagate desired minimum for log index in assignments and 
> use it during the voting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21309) DirectMessageWriter keeps holding used buffers

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21309:
---
Reviewer: Kirill Tkalenko

> DirectMessageWriter keeps holding used buffers
> --
>
> Key: IGNITE-21309
> URL: https://issues.apache.org/jira/browse/IGNITE-21309
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Thread-local optimized marshallers store links to write buffers in their 
> internal stacks, which could lead to occasional OOMs. We should release 
> buffers after writing nested messages in DirectMessageWriter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21309) DirectMessageWriter keeps holding used buffers

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21309:
--

 Summary: DirectMessageWriter keeps holding used buffers
 Key: IGNITE-21309
 URL: https://issues.apache.org/jira/browse/IGNITE-21309
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2


Thread-local optimized marshallers store links to write buffers in their 
internal stacks, which could lead to occasional OOMs. We should release buffers 
after writing nested messages in DirectMessageWriter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21305) Internal API for truncating log suffix

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21305:
--

 Summary: Internal API for truncating log suffix
 Key: IGNITE-21305
 URL: https://issues.apache.org/jira/browse/IGNITE-21305
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


API and implementation is needed to truncate suffix of peers in ERROR state 
that cannot proceed applying commands



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21305) Internal API for truncating log suffix

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21305:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Internal API for truncating log suffix
> --
>
> Key: IGNITE-21305
> URL: https://issues.apache.org/jira/browse/IGNITE-21305
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>
> API and implementation is needed to truncate suffix of peers in ERROR state 
> that cannot proceed applying commands



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21256) Internal API for local partition states

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21256:
---
Description: 
Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
list. We need an API (with implementation) to access the list of local 
partitions and their states. The way to determine them:
 * comparing current assignments with replica states
 * check the state machine, it might be broken or installing snapshot

  was:
Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
list. We need an API to access the list of local partitions and their states. 
The way to determine them:
 * comparing current assignments with replica states
 * check the state machine, it might be broken or installing snapshot


> Internal API for local partition states
> ---
>
> Key: IGNITE-21256
> URL: https://issues.apache.org/jira/browse/IGNITE-21256
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
> list. We need an API (with implementation) to access the list of local 
> partitions and their states. The way to determine them:
>  * comparing current assignments with replica states
>  * check the state machine, it might be broken or installing snapshot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21284) Internal API for manual raft group configuration update

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21284:
---
Description: 
We need an API (with implementation) that's analogous to 
"reset-lost-partitions", but with the ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.

  was:
We need an API that's analogous to "reset-lost-partitions", but with the 
ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.


> Internal API for manual raft group configuration update
> ---
>
> Key: IGNITE-21284
> URL: https://issues.apache.org/jira/browse/IGNITE-21284
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> We need an API (with implementation) that's analogous to 
> "reset-lost-partitions", but with the ability to reuse living minority of 
> nodes.
> This API should gather the states of partitions, identify healthy peers, and 
> use them as a new raft group configuration (through the update of 
> assignments).
> We have to make sure that node with latest log index will become a leader, so 
> we will have to propagate desired minimum for log index in assignments and 
> use it during the voting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21304) Internal API for restarting partitions

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21304:
--

 Summary: Internal API for restarting partitions
 Key: IGNITE-21304
 URL: https://issues.apache.org/jira/browse/IGNITE-21304
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


API and implementation should be provided for restarting peers in raft groups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21303) Exclude nodes in "error" state from manual group reconfiguration

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21303:
--

 Summary: Exclude nodes in "error" state from manual group 
reconfiguration
 Key: IGNITE-21303
 URL: https://issues.apache.org/jira/browse/IGNITE-21303
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Instead of simply using existing set of node as a baseline for new assignments, 
we should either exclude peers in ERROR state from it, or force data cleanup on 
such nodes. Third option - forbid such reconfiguration, forcing user to clear 
ERROR peers in advance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21302) Prohibit automatic group reconfiguration when there's no majority

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21302:
--

 Summary: Prohibit automatic group reconfiguration when there's no 
majority
 Key: IGNITE-21302
 URL: https://issues.apache.org/jira/browse/IGNITE-21302
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


scaleDown timer should not lead to a situation where user loses the data.

Default "changePeers" behavior also won't work, because there's no majority and 
thus no leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21301) Sync raft log before flush in all storage engines

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21301:
--

 Summary: Sync raft log before flush in all storage engines
 Key: IGNITE-21301
 URL: https://issues.apache.org/jira/browse/IGNITE-21301
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Checkpoints and RocsDB's flush actions should sync log before completing 
writing data to disk, if "fsync" is disabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21300) Implement disaster recovery for secondary indexes

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21300:
--

 Summary: Implement disaster recovery for secondary indexes
 Key: IGNITE-21300
 URL: https://issues.apache.org/jira/browse/IGNITE-21300
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


It is possible that if we lost part of the log, some available indexes might 
become "locally" unavailable. We will have to finish build process second time 
in such a case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21299) Rest API for disaster recovery commands

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21299:
--

 Summary: Rest API for disaster recovery commands
 Key: IGNITE-21299
 URL: https://issues.apache.org/jira/browse/IGNITE-21299
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Please refer to https://issues.apache.org/jira/browse/IGNITE-21298 for a list



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21298) CLI for disaster recovery commands

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21298:
--

 Summary: CLI for disaster recovery commands
 Key: IGNITE-21298
 URL: https://issues.apache.org/jira/browse/IGNITE-21298
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Names might change.
 * ignite restart-partitions --nodes  [--zones ]
[--partitions ] [--purge]


 * ignite reset-lost-partitions [--zones ]
[--partitions ]


 * ignite truncate-log-suffix --zone  --partition  
--index 


 * ignite partition-states [--local [--nodes ] | --global] [--zones 
] [--partitions ]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21295) Public Java API for manual raft group configuration update

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21295:
--

 Summary: Public Java API for manual raft group configuration update
 Key: IGNITE-21295
 URL: https://issues.apache.org/jira/browse/IGNITE-21295
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Implement public API for IGNITE-21284



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21257) Public Java API to get global partition states

2024-01-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21257:
---
Summary: Public Java API to get global partition states  (was: Public API 
to get global partition states)

> Public Java API to get global partition states
> --
>
> Key: IGNITE-21257
> URL: https://issues.apache.org/jira/browse/IGNITE-21257
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
> list.
> We should use local partition states, implemented in IGNITE-21256, and 
> combine them in cluster-wide compute call, before returning to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21284) Internal API for manual raft group configuration update

2024-01-17 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21284:
--

 Summary: Internal API for manual raft group configuration update
 Key: IGNITE-21284
 URL: https://issues.apache.org/jira/browse/IGNITE-21284
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


We need an API that's analogous to "reset-lost-partitions", but with the 
ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21256) Internal API for local partition states

2024-01-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21256:
--

Assignee: Ivan Bessonov

> Internal API for local partition states
> ---
>
> Key: IGNITE-21256
> URL: https://issues.apache.org/jira/browse/IGNITE-21256
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
> list. We need an API to access the list of local partitions and their states. 
> The way to determine them:
>  * comparing current assignments with replica states
>  * check the state machine, it might be broken or installing snapshot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21234) Acquired checkpoint read lock waits for schedules checkpoint write unlock sometimes

2024-01-15 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21234:
---
Reviewer: Kirill Tkalenko

> Acquired checkpoint read lock waits for schedules checkpoint write unlock 
> sometimes
> ---
>
> Key: IGNITE-21234
> URL: https://issues.apache.org/jira/browse/IGNITE-21234
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In a situation where we have "too many dirty pages" we trigger checkpoint and 
> wait until it starts. This can take seconds, because we have to flush 
> free-lists before acquiring checkpoint write lock. This can cause severe dips 
> in performance for no good reason.
> I suggest introducing two modes for triggering checkpoints when we have too 
> many dirty pages: soft threshold and hard threshold.
>  * soft - trigger checkpoint, but don't wait for its start. Just continue all 
> operations as usual. Make it like a current threshold  - 75% of any existing 
> memory segment must be dirty.
>  * hard - trigger checkpoint and wait until it starts. The way it behaves 
> right now. Make it higher than current threshold - 90% of any existing memory 
> segment must be dirty.
> Maybe we should use different values for thresholds, that should be discussed 
> during the review



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21257) Public API to get global partition states

2024-01-15 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21257:
--

 Summary: Public API to get global partition states
 Key: IGNITE-21257
 URL: https://issues.apache.org/jira/browse/IGNITE-21257
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the list.

We should use local partition states, implemented in IGNITE-21256, and combine 
them in cluster-wide compute call, before returning to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21256) Internal API for local partition states

2024-01-15 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21256:
--

 Summary: Internal API for local partition states
 Key: IGNITE-21256
 URL: https://issues.apache.org/jira/browse/IGNITE-21256
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
list. We need an API to access the list of local partitions and their states. 
The way to determine them:
 * comparing current assignments with replica states
 * check the state machine, it might be broken or installing snapshot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21140) Ignite 3 Disaster Recovery

2024-01-15 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21140:
---
Description: 
This epic is related to issues that may happen with users when part of their 
data becomes unavailable for some reasons, like "node is lost", or "part of the 
storage is lost", etc.

Following definitions will be used throughout:

Local partition states. A local property of replica, storage, state machine, 
etc., associated with the partition:
 * _Healthy_
State machine is running, everything’s fine.
 * _Initializing_
Ignite node is online, but the corresponding raft group is yet to complete its 
initialization.
 * _Snapshot installation_
Full state transfer is taking place. Once it’s finished, the partition will 
become _healthy_ or {_}catching-up{_}. Before that, data can’t be read, and log 
replication is also on pause.
 * _Catching-up_
Node is in the process of replicating data from the leader, and its data is a 
little bit in the past. This state can only be observed from the leader, 
because only the leader has the latest committed index and the state of every 
peer.
 * _Broken_
Something’s wrong with the state machine. Some data might be unavailable for 
reading, log can’t be replicated, and this state won’t be changed automatically 
without intervention.

 * Global partition states. A global property of a partition, that specifies 
its apparent functionality from user’s point of view:
 * _Available partition_
Healthy partition that can process read and write requests. This means that the 
majority of peers are healthy at the moment.
 * _Read-only partition_
Partition that can process read requests, but can’t process write requests. 
There’s no healthy majority, but there’s at least one alive (healthy/catch-up) 
peer that can process historical read-only queries.
 * _Unavailable partition_
Partition that can’t process any requests.

Building blocks are a set of operations that can be executed by Ignite or by 
the user in order to improve cluster state.

Each building block must either be an automatic action with configurable 
timeout (if applicable), or a documented API, with mandatory 
diagnostics/metrics that would allow users to make decisions about these 
actions.
 # Offline Ignite node is brought back online, having all recent data.
_Not a disaster recovery mechanism, but worth mentioning._
A node with usable data, that doesn’t require full state transfer, will become 
a peer, will participate in voting and replication, allowing partition to be 
_available_ if majority is healthy. This is the best case for the user, where 
they simply restart offline nodes and the cluster continues being operable.
 # Automatic group scale-down.
Should happen when an Ignite node is offline for too long.
Not a disaster recovery mechanism, but worth mentioning.
Only happens when the majority is online, meaning that user data is safe.
 # Manual partition restart.
Should be performed manually for broken peers.
 # Manual group peers/learners reconfiguration.
Should be performed on a group manually, if the majority is considered 
permanently lost.
 # Freshly re-entering the group.
Should happen when an Ignite node is returned back to the group, but partition 
data is missing.
 # Cleaning the partition data.
If, for some reason, we know that a certain partition on a certain node is 
broken, we may ask Ignite to drop its data and re-enter the group empty (as 
stated in option 5).
Having a dedicated operation for cleaning the partition is preferable, because:
 ## partition is be stored in several storages
 ## not all of them have a “file per partition” storage format, not even close
 ## there’s also raft log that should be cleaned, most likely
 ## maybe raft meta as well
 # Partial truncation of the log’s suffix.
This is a case of partial cleanup of partition data. This operation might be 
useful if we know that there’s junk in the log, but storages are not corrupted, 
so there’s a chance to save some data. Can be replaced with “clean partition 
data”.

In order for the user to make decisions about manual operations, we must 
provide partition states for all partitions in all tables/zones. Both global 
and local states. Global states are more important, because they directly 
correlate with user experience.

Some states will automatically lead to “available” partitions, if the system 
overall is healthy and we simply wait for some time. For example, we wait until 
a snapshot installation, or a rebalance is complete, and we’re happy. This is 
not considered a building block, because it’s a natural artifact of the 
architecture.

Current list is not exhaustive, it consists of basic actions that we could 
implement that would cover a wide range of potential issues.
Any other addition to the list of basic blocks would simply refine it, 
potentially allowing users to recover faster, or with less data being 

[jira] [Created] (IGNITE-21245) Don't store applied revision in Vault

2024-01-12 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21245:
--

 Summary: Don't store applied revision in Vault
 Key: IGNITE-21245
 URL: https://issues.apache.org/jira/browse/IGNITE-21245
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2


In a newer local node recovery implementation we stopped relying on Vault data, 
but didn't remove APPLIED_REV_KEY, which might confuse some developers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21140) Ignite 3 Disaster Recovery

2024-01-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21140:
---
Description: 
This epic is related to issues that may happen with users when part of their 
data becomes unavailable for some reasons, like "node is lost", or "part of the 
storage is lost", etc.

Following definitions will be used throughout:
 * Local partition states. A local property of replica, storage, state machine, 
etc., associated with the partition:
 * _Healthy_
State machine is running, everything’s fine.
 * _Initializing_
Ignite node is online, but the corresponding raft group is yet to complete its 
initialization.
 * _Snapshot installation_
Full state transfer is taking place. Once it’s finished, the partition will 
become _healthy_ or {_}catching-up{_}. Before that, data can’t be read, and log 
replication is also on pause.
 * _Catching-up_
Node is in the process of replicating data from the leader, and its data is a 
little bit in the past. This state can only be observed from the leader, 
because only the leader has the latest committed index and the state of every 
peer.
 * _Broken_
Something’s wrong with the state machine. Some data might be unavailable for 
reading, log can’t be replicated, and this state won’t be changed automatically 
without intervention.

 * Global partition states. A global property of a partition, that specifies 
its apparent functionality from user’s point of view:
 * _Available partition_
Healthy partition that can process read and write requests. This means that the 
majority of peers are healthy at the moment.
 * _Read-only partition_
Partition that can process read requests, but can’t process write requests. 
There’s no healthy majority, but there’s at least one alive (healthy/catch-up) 
peer that can process historical read-only queries.
 * _Unavailable partition_
Partition that can’t process any requests.

Building blocks are a set of operations that can be executed by Ignite or by 
the user in order to improve cluster state.

Each building block must either be an automatic action with configurable 
timeout (if applicable), or a documented API, with mandatory 
diagnostics/metrics that would allow users to make decisions about these 
actions.
 # Offline Ignite node is brought back online, having all recent data.
_Not a disaster recovery mechanism, but worth mentioning._
A node with usable data, that doesn’t require full state transfer, will become 
a peer, will participate in voting and replication, allowing partition to be 
_available_ if majority is healthy. This is the best case for the user, where 
they simply restart offline nodes and the cluster continues being operable.
 # Automatic group scale-down.
Should happen when an Ignite node is offline for too long.
Not a disaster recovery mechanism, but worth mentioning.
Only happens when the majority is online, meaning that user data is safe.
 # Manual partition restart.
Should be performed manually for broken peers.
 # Manual group peers/learners reconfiguration.
Should be performed on a group manually, if the majority is considered 
permanently lost.
 # Freshly re-entering the group.
Should happen when an Ignite node is returned back to the group, but partition 
data is missing.
 # Cleaning the partition data.
If, for some reason, we know that a certain partition on a certain node is 
broken, we may ask Ignite to drop its data and re-enter the group empty (as 
stated in option 5).
Having a dedicated operation for cleaning the partition is preferable, because:
 ## partition is be stored in several storages
 ## not all of them have a “file per partition” storage format, not even close
 ## there’s also raft log that should be cleaned, most likely
 ## maybe raft meta as well
 # Partial truncation of the log’s suffix.
This is a case of partial cleanup of partition data. This operation might be 
useful if we know that there’s junk in the log, but storages are not corrupted, 
so there’s a chance to save some data. Can be replaced with “clean partition 
data”.

In order for the user to make decisions about manual operations, we must 
provide partition states for all partitions in all tables/zones. Both global 
and local states. Global states are more important, because they directly 
correlate with user experience.

Some states will automatically lead to “available” partitions, if the system 
overall is healthy and we simply wait for some time. For example, we wait until 
a snapshot installation, or a rebalance is complete, and we’re happy. This is 
not considered a building block, because it’s a natural artifact of the 
architecture.

Current list is not exhaustive, it consists of basic actions that we could 
implement that would cover a wide range of potential issues.
Any other addition to the list of basic blocks would simply refine it, 
potentially allowing users to recover faster, or with less data being 

[jira] [Updated] (IGNITE-21140) Ignite 3 Disaster Recovery

2024-01-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21140:
---
Description: 
This epic is related to issues that may happen with users when part of their 
data becomes unavailable for some reasons, like "node is lost", or "part of the 
storage is lost", etc.

Following definitions will be used throughout:
 * Local partition states. A local property of replica, storage, state machine, 
etc., associated with the partition:
 * {_}Healthy{_}
State machine is running, everything’s fine.
 * {_}Initializing{_}
Ignite node is online, but the corresponding raft group is yet to complete its 
initialization.
 * {_}Snapshot installation{_}
Full state transfer is taking place. Once it’s finished, the partition will 
become _healthy_ or {_}catching-up{_}. Before that, data can’t be read, and log 
replication is also on pause.
 * {_}Catching-up{_}
Node is in the process of replicating data from the leader, and its data is a 
little bit in the past. This state can only be observed from the leader, 
because only the leader has the latest committed index and the state of every 
peer.
 * {_}Broken{_}
Something’s wrong with the state machine. Some data might be unavailable for 
reading, log can’t be replicated, and this state won’t be changed automatically 
without intervention.


 * Global partition states. A global property of a partition, that specifies 
its apparent functionality from user’s point of view:
 * {_}Available partition{_}
Healthy partition that can process read and write requests. This means that the 
majority of peers are healthy at the moment.
 * {_}Read-only partition{_}
Partition that can process read requests, but can’t process write requests. 
There’s no healthy majority, but there’s at least one alive (healthy/catch-up) 
peer that can process historical read-only queries.
 * {_}Unavailable partition
{_}Partition that can’t process any requests.


Building blocks are a set of operations that can be executed by Ignite or by 
the user in order to improve cluster state.

Each building block must either be an automatic action with configurable 
timeout (if applicable), or a documented API, with mandatory 
diagnostics/metrics that would allow users to make decisions about these 
actions.
 # Offline Ignite node is brought back online, having all recent data.
{_}Not a disaster recovery mechanism, but worth mentioning.{_}
A node with usable data, that doesn’t require full state transfer, will become 
a peer, will participate in voting and replication, allowing partition to be 
_available_ if majority is healthy. This is the best case for the user, where 
they simply restart offline nodes and the cluster continues being operable.


 # Automatic group scale-down.
Should happen when an Ignite node is offline for too long.
Not a disaster recovery mechanism, but worth mentioning.
Only happens when the majority is online, meaning that user data is safe.


 # Manual partition restart.
Should be performed manually for broken peers.


 # Manual group peers/learners reconfiguration.
Should be performed on a group manually, if the majority is considered 
permanently lost.


 # Freshly re-entering the group.
Should happen when an Ignite node is returned back to the group, but partition 
data is missing.


 # Cleaning the partition data.
If, for some reason, we know that a certain partition on a certain node is 
broken, we may ask Ignite to drop its data and re-enter the group empty (as 
stated in option 5).
Having a dedicated operation for cleaning the partition is preferable, because:
 - partition is be stored in several storages
 - not all of them have a “file per partition” storage format, not even close
 - there’s also raft log that should be cleaned, most likely
 - maybe raft meta as well


 # Partial truncation of the log’s suffix.
This is a case of partial cleanup of partition data. This operation might be 
useful if we know that there’s junk in the log, but storages are not corrupted, 
so there’s a chance to save some data. Can be replaced with “clean partition 
data”.

In order for the user to make decisions about manual operations, we must 
provide partition states for all partitions in all tables/zones. Both global 
and local states. Global states are more important, because they directly 
correlate with user experience.

Some states will automatically lead to “available” partitions, if the system 
overall is healthy and we simply wait for some time. For example, we wait until 
a snapshot installation, or a rebalance is complete, and we’re happy. This is 
not considered a building block, because it’s a natural artifact of the 
architecture.

Current list is not exhaustive, it consists of basic actions that we could 
implement that would cover a wide range of potential issues.
Any other addition to the list of basic blocks would simply refine it, 
potentially allowing 

[jira] [Updated] (IGNITE-21234) Acquired checkpoint read lock waits for schedules checkpoint write unlock sometimes

2024-01-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21234:
---
Summary: Acquired checkpoint read lock waits for schedules checkpoint write 
unlock sometimes  (was: Checkpoint read lock waits for checkpoint write unlock 
sometimes)

> Acquired checkpoint read lock waits for schedules checkpoint write unlock 
> sometimes
> ---
>
> Key: IGNITE-21234
> URL: https://issues.apache.org/jira/browse/IGNITE-21234
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In a situation where we have "too many dirty pages" we trigger checkpoint and 
> wait until it starts. This can take seconds, because we have to flush 
> free-lists before acquiring checkpoint write lock. This can cause severe dips 
> in performance for no good reason.
> I suggest introducing two modes for triggering checkpoints when we have too 
> many dirty pages: soft threshold and hard threshold.
>  * soft - trigger checkpoint, but don't wait for its start. Just continue all 
> operations as usual. Make it like a current threshold  - 75% of any existing 
> memory segment must be dirty.
>  * hard - trigger checkpoint and wait until it starts. The way it behaves 
> right now. Make it higher than current threshold - 90% of any existing memory 
> segment must be dirty.
> Maybe we should use different values for thresholds, that should be discussed 
> during the review



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21234) Checkpoint read lock waits for checkpoint write unlock sometimes

2024-01-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21234:
---
Description: 
In a situation where we have "too many dirty pages" we trigger checkpoint and 
wait until it starts. This can take seconds, because we have to flush 
free-lists before acquiring checkpoint write lock. This can cause severe dips 
in performance for no good reason.

I suggest introducing two modes for triggering checkpoints when we have too 
many dirty pages: soft threshold and hard threshold.
 * soft - trigger checkpoint, but don't wait for its start. Just continue all 
operations as usual. Make it like a current threshold  - 75% of any existing 
memory segment must be dirty.
 * hard - trigger checkpoint and wait until it starts. The way it behaves right 
now. Make it higher than current threshold - 90% of any existing memory segment 
must be dirty.

Maybe we should use different values for thresholds, that should be discussed 
during the review

  was:
In a situation where we have "too many dirty pages" we trigger checkpoint and 
wait until it starts. This can take seconds, because we have to flush 
free-lists before acquiring checkpoint write lock. This can cause severe dips 
in performance for no good reason.

I suggest introducing two modes for triggering checkpoints when we have too 
many dirty pages: soft threshold and hard threshold.
 * soft - trigger checkpoint, but don't wait for its start. Just continue all 
operations as usual. Make it like a current threshold  - 75% of any existing 
memory segment must be dirty.
 * hard - trigger checkpoint and wait until it starts. The way it behaves right 
now. Make it higher than current threshold - 90% of any existing memory segment 
must be dirty.


> Checkpoint read lock waits for checkpoint write unlock sometimes
> 
>
> Key: IGNITE-21234
> URL: https://issues.apache.org/jira/browse/IGNITE-21234
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In a situation where we have "too many dirty pages" we trigger checkpoint and 
> wait until it starts. This can take seconds, because we have to flush 
> free-lists before acquiring checkpoint write lock. This can cause severe dips 
> in performance for no good reason.
> I suggest introducing two modes for triggering checkpoints when we have too 
> many dirty pages: soft threshold and hard threshold.
>  * soft - trigger checkpoint, but don't wait for its start. Just continue all 
> operations as usual. Make it like a current threshold  - 75% of any existing 
> memory segment must be dirty.
>  * hard - trigger checkpoint and wait until it starts. The way it behaves 
> right now. Make it higher than current threshold - 90% of any existing memory 
> segment must be dirty.
> Maybe we should use different values for thresholds, that should be discussed 
> during the review



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21234) Checkpoint read lock waits for checkpoint write unlock sometimes

2024-01-11 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21234:
--

 Summary: Checkpoint read lock waits for checkpoint write unlock 
sometimes
 Key: IGNITE-21234
 URL: https://issues.apache.org/jira/browse/IGNITE-21234
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


In a situation where we have "too many dirty pages" we trigger checkpoint and 
wait until it starts. This can take seconds, because we have to flush 
free-lists before acquiring checkpoint write lock. This can cause severe dips 
in performance for no good reason.

I suggest introducing two modes for triggering checkpoints when we have too 
many dirty pages: soft threshold and hard threshold.
 * soft - trigger checkpoint, but don't wait for its start. Just continue all 
operations as usual. Make it like a current threshold  - 75% of any existing 
memory segment must be dirty.
 * hard - trigger checkpoint and wait until it starts. The way it behaves right 
now. Make it higher than current threshold - 90% of any existing memory segment 
must be dirty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21140) Ignite 3 Disaster Recovery

2024-01-10 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21140:
---
Description: This epic is related to issues that may happen with users when 
part of their data becomes unavailable for some reasons, like "node is lost", 
or "part of the storage is lost", etc.

> Ignite 3 Disaster Recovery
> --
>
> Key: IGNITE-21140
> URL: https://issues.apache.org/jira/browse/IGNITE-21140
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> This epic is related to issues that may happen with users when part of their 
> data becomes unavailable for some reasons, like "node is lost", or "part of 
> the storage is lost", etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-20051) Add startup recovery to SchemaManager

2024-01-09 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-20051.

Resolution: Fixed

> Add startup recovery to SchemaManager
> -
>
> Key: IGNITE-20051
> URL: https://issues.apache.org/jira/browse/IGNITE-20051
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> Currently, {{SchemaManager}} does not implement a proper recovery procedure 
> at start. It needs a way to get all tables from the CatalogService (including 
> the dropped ones). Also, it must make sure that versions of the tables that 
> were missed due to being offline are added to the schemas storage as a result 
> of recovery.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21205) Don't store table versions in meta-storage in SchemaManager

2024-01-08 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21205:
---
Description: 
Current implementation blocks meta-storage watch thread (doesn't allow new 
watch events to be processed, to be precise) until new schema version is 
written into the meta-storage. This is an expensive IO operation, and it might 
introduce unexpected turbulence.

Some scenarios greatly suffer from it. For example, we can't process lease 
updates while we're writing into meta-storage, which shouldn't be the case.

  was:
Current implementation blocks meta-storage watch thread until new schema 
version is written into the meta-storage. This is an expensive IO operation, 
and it might introduce unexpected turbulence.

Some scenarios greatly suffer from it. For example, we can't process lease 
updates while we're writing into meta-storage, which shouldn't be the case.


> Don't store table versions in meta-storage in SchemaManager
> ---
>
> Key: IGNITE-21205
> URL: https://issues.apache.org/jira/browse/IGNITE-21205
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current implementation blocks meta-storage watch thread (doesn't allow new 
> watch events to be processed, to be precise) until new schema version is 
> written into the meta-storage. This is an expensive IO operation, and it 
> might introduce unexpected turbulence.
> Some scenarios greatly suffer from it. For example, we can't process lease 
> updates while we're writing into meta-storage, which shouldn't be the case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21205) Don't store table versions in meta-storage in SchemaManager

2024-01-07 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21205:
---
Description: 
Current implementation blocks meta-storage watch thread until new schema 
version is written into the meta-storage. This is an expensive IO operation, 
and it might introduce unexpected turbulence.

Some scenarios greatly suffer from it. For example, we can't process lease 
updates while we're writing into meta-storage, which shouldn't be the case.

  was:TBD


> Don't store table versions in meta-storage in SchemaManager
> ---
>
> Key: IGNITE-21205
> URL: https://issues.apache.org/jira/browse/IGNITE-21205
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current implementation blocks meta-storage watch thread until new schema 
> version is written into the meta-storage. This is an expensive IO operation, 
> and it might introduce unexpected turbulence.
> Some scenarios greatly suffer from it. For example, we can't process lease 
> updates while we're writing into meta-storage, which shouldn't be the case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21205) Don't store table versions in meta-storage in SchemaManager

2024-01-05 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21205:
--

 Summary: Don't store table versions in meta-storage in 
SchemaManager
 Key: IGNITE-21205
 URL: https://issues.apache.org/jira/browse/IGNITE-21205
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


TBD



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21204) Use shared rocksdb instance for all TX state storages

2024-01-05 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21204:
--

 Summary: Use shared rocksdb instance for all TX state storages
 Key: IGNITE-21204
 URL: https://issues.apache.org/jira/browse/IGNITE-21204
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


Current implementation uses too many resources if you create multiple tables. 
Table creation time suffers too.

We need to use the same approach as "rocksdb" storage engine uses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21196) PrimaryReplicaEvent handling is inefficient

2024-01-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21196:
---
Reviewer: Vladislav Pyatkov  (was: Vladislav Pyatkov)

> PrimaryReplicaEvent handling is inefficient
> ---
>
> Key: IGNITE-21196
> URL: https://issues.apache.org/jira/browse/IGNITE-21196
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, every partition replica listener has its own set of instances of 
> {{PrimaryReplicaEvent}} listeners. In 
> {{LeaseTracker.UpdateListener#onUpdate}} we create these events in a loop.
>  
> This results in 2 nested loops, which might be extremely inefficient if we 
> have a lot of replicas on the node. Most of iterations will do nothing 
> because the following condition won't pass:
> {{!replicationGroupId.equals(evt.groupId())}}
>  
> I suggest subscribing to these events in {{ReplicaManager}} and perform 
> necessary filtering in advance. Such change will greatly improve the 
> performance of lease update watch event processing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21196) PrimaryReplicaEvent handling is inefficient

2024-01-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21196:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> PrimaryReplicaEvent handling is inefficient
> ---
>
> Key: IGNITE-21196
> URL: https://issues.apache.org/jira/browse/IGNITE-21196
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, every partition replica listener has its own set of instances of 
> {{PrimaryReplicaEvent}} listeners. In 
> {{LeaseTracker.UpdateListener#onUpdate}} we create these events in a loop.
>  
> This results in 2 nested loops, which might be extremely inefficient if we 
> have a lot of replicas on the node. Most of iterations will do nothing 
> because the following condition won't pass:
> {{!replicationGroupId.equals(evt.groupId())}}
>  
> I suggest subscribing to these events in {{ReplicaManager}} and perform 
> necessary filtering in advance. Such change will greatly improve the 
> performance of lease update watch event processing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21198) Optimize memory usage of AbstractEventProducer#fireEvent

2024-01-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21198:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Optimize memory usage of AbstractEventProducer#fireEvent
> 
>
> Key: IGNITE-21198
> URL: https://issues.apache.org/jira/browse/IGNITE-21198
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In current implementation, most of listeners do their work synchronously and 
> return already completed futures. In that cases there's no sense to allocate 
> the entire array of futures and fill it.
> Another reason for not allocating an array right away is the fact that we may 
> have a big number of listeners, and allocating an array will be expensive and 
> wasteful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21198) Optimize memory usage of AbstractEventProducer#fireEvent

2024-01-03 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21198:
--

 Summary: Optimize memory usage of AbstractEventProducer#fireEvent
 Key: IGNITE-21198
 URL: https://issues.apache.org/jira/browse/IGNITE-21198
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


In current implementation, most of listeners do their work synchronously and 
return already completed futures. In that cases there's no sense to allocate 
the entire array of futures and fill it.

Another reason for not allocating an array right away is the fact that we may 
have a big number of listeners, and allocating an array will be expensive and 
wasteful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21196) PrimaryReplicaEvent handling is inefficient

2024-01-03 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21196:
--

 Summary: PrimaryReplicaEvent handling is inefficient
 Key: IGNITE-21196
 URL: https://issues.apache.org/jira/browse/IGNITE-21196
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


Currently, every partition replica listener has its own set of instances of 
{{PrimaryReplicaEvent}} listeners. In {{LeaseTracker.UpdateListener#onUpdate}} 
we create these events in a loop.
 
This results in 2 nested loops, which might be extremely inefficient if we have 
a lot of replicas on the node. Most of iterations will do nothing because the 
following condition won't pass:
{{!replicationGroupId.equals(evt.groupId())}}
 
I suggest subscribing to these events in {{ReplicaManager}} and perform 
necessary filtering in advance. Such change will greatly improve the 
performance of lease update watch event processing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21140) Ignite 3 Disaster Recovery

2023-12-22 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21140:
--

 Summary: Ignite 3 Disaster Recovery
 Key: IGNITE-21140
 URL: https://issues.apache.org/jira/browse/IGNITE-21140
 Project: Ignite
  Issue Type: Epic
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-19819) Lease batches compaction

2023-12-21 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-19819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799313#comment-17799313
 ] 

Ivan Bessonov commented on IGNITE-19819:


Please also consider a more space-efficient way to pack values into byte[], I 
have a quick POC here: [https://github.com/apache/ignite-3/pull/2976]

The payload in the POC is about 10 times less than in main. Integrating similar 
ideas into the proposed improvement could make it better

> Lease batches compaction
> 
>
> Key: IGNITE-19819
> URL: https://issues.apache.org/jira/browse/IGNITE-19819
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation* 
> After IGNITE-19578 leases should be stored as a single batch in meta storage. 
> However, the size of such a batch is significant and can be reduced.
> Each lease contains group name, leaseholder name, left and right timestamp 
> and couple of boolean flags.
> Many leases share the same leaseholder. Also, many leases share the same 
> right border, as batch of leases are renewed on every iteration of lease 
> updater and get the same right border.
> So, the compacted data structure for all leases could be a map
> {code:java}
> right border -> set of leaseholders -> set of leases which contain only group 
> name, left border and flags.{code}
> It is important that this data structure is applicable to meta storage 
> representation, in-memory representation of leases should remain the same.
> *Definition of done*
> Amount of space required for storing leases is significantly reduced.
> *Implementation notes*
> The key should be prefix + right border. On each iteration the corresponding 
> right border should be removed and new one put, so on each iteration there 
> will be done just one meta storage invoke. 
> To avoid ABA problem during leases' updates via invokes, entries should be 
> versioned, this can be done by assigning unique version to each right border 
> key. There are cases when all leaseholders can be removed from some entry and 
> then another leaseholders added again (e.g. accepting leases and removing 
> them from right border that matches long-term unaccepted leases, and later 
> adding the regular leases to the same right border). In this cases the entry 
> should not be removed from meta storage, in spite it doesn't have leases, to 
> preserve the version of the entry.
> To avoid merging of batches, lease prolongation should be changed: new right 
> border should be calculated as current_right_border + lease_interval (now: 
> current_time + lease_interval ).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21091) Send indexes data during full state transfer

2023-12-15 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21091:
--

 Summary: Send indexes data during full state transfer
 Key: IGNITE-21091
 URL: https://issues.apache.org/jira/browse/IGNITE-21091
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Current full state transfer implementation dictates that receiver will build 
all secondary indexes on the fly. This might not be efficient:
 * receiver will have to extract index tuples from each row version
 * for rows with multiple versions, many of these tuples will be the same. 
Which means that the "extraction" and "insertion" will be performed several 
times for the same value

Of course, it all depends on how many versions each row has, but generally 
speaking, inserting data one time is always better than inserting it multiple 
times.

It is proposed to send indexes the same way as we send version chains. By using 
scan operation and copy-on-write semantics during data modifications (on the 
sender).

The specifics of the algorithm are not clear yet. This issues includes 
investigation of the proper approach, that will eliminate
 * the possibility of data inconsistencies
 * data leaks
 * excessive memory consumption on sender 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21076) Creating the table with 1024 partitions is too slow

2023-12-13 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21076:
--

 Summary: Creating the table with 1024 partitions is too slow
 Key: IGNITE-21076
 URL: https://issues.apache.org/jira/browse/IGNITE-21076
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


With the default 25 partitions, creating the table takes few seconds. While not 
ideal, it's not too bad. But, when increasing the number of partition to 1024 
(default of Ignite 2), time increases to about 15-20 seconds.

With such a small number of partitions, time shouldn't scale this drastically. 
Most of the time that spent on creation of a single partition should be a) 
creating storage and b) leader election. Assuming that, time for 25 and 1024 
partitions should not differ this much.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21063:
--

Assignee: Ivan Bessonov

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM after a while, managing to create about 500 tales locally. We 
> need to research, why it happens. Is there a leak, or we simply use too much 
> memory.
> Main candidate: thread-local marshallers. For some reason, we use too many 
> threads, I guess? Meta-storage entries may be up to several megabytes in 
> current implementation.
> We should limit the size of cached buffers, and number of threads in general. 
> Shared pool (priority-queue) of pre-allocated buffers would solve the issue, 
> they don't have to be thread-local. It's a bit slower, but it's not a problem 
> until proven otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Description: 
Fails with OOM after a while, managing to create about 500 tales locally. We 
need to research, why it happens. Is there a leak, or we simply use too much 
memory.

Main candidate: thread-local marshallers. For some reason, we use too many 
threads, I guess? Meta-storage entries may be up to several megabytes in 
current implementation.

We should limit the size of cached buffers, and number of threads in general. 
Shared pool (priority-queue) of pre-allocated buffers would solve the issue, 
they don't have to be thread-local. It's a bit slower, but it's not a problem 
until proven otherwise

  was:Fails with OOM after a while, managing to create about 500 tales locally. 
We need to research, why it happens. Is there a leak, or we simply use too much 
memory


> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM after a while, managing to create about 500 tales locally. We 
> need to research, why it happens. Is there a leak, or we simply use too much 
> memory.
> Main candidate: thread-local marshallers. For some reason, we use too many 
> threads, I guess? Meta-storage entries may be up to several megabytes in 
> current implementation.
> We should limit the size of cached buffers, and number of threads in general. 
> Shared pool (priority-queue) of pre-allocated buffers would solve the issue, 
> they don't have to be thread-local. It's a bit slower, but it's not a problem 
> until proven otherwise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Description: Fails with OOM after a while, managing to create about 500 
tales locally. We need to research, why it happens. Is there a leak, or we 
simply use too much memory  (was: Fails with OOM on TC. We need to research, 
why it happens. Is there a leak, or we simply use too much memory)

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM after a while, managing to create about 500 tales locally. We 
> need to research, why it happens. Is there a leak, or we simply use too much 
> memory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Description: Fails with OOM on TC. We need to research, why it happens. Is 
there a leak, or we simply use too much memory  (was: Fails with OOM on TC)

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM on TC. We need to research, why it happens. Is there a leak, 
> or we simply use too much memory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21063:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Cannot create 1000 tables
> -
>
> Key: IGNITE-21063
> URL: https://issues.apache.org/jira/browse/IGNITE-21063
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Fails with OOM on TC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21063) Cannot create 1000 tables

2023-12-12 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21063:
--

 Summary: Cannot create 1000 tables
 Key: IGNITE-21063
 URL: https://issues.apache.org/jira/browse/IGNITE-21063
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


Fails with OOM on TC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21062) Safe time reordering in partitions

2023-12-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21062:
--

Assignee: Ivan Bessonov

> Safe time reordering in partitions
> --
>
> Key: IGNITE-21062
> URL: https://issues.apache.org/jira/browse/IGNITE-21062
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> In the scenario of creating a lot of table and having slow system 
> (presumably), it's possible to notice {{Safe time reordering detected 
> [current=...}} assertion error in logs.
> It happens with safe-time sync commands, in the absence of transactional load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21062) Safe time reordering in partitions

2023-12-12 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21062:
--

 Summary: Safe time reordering in partitions
 Key: IGNITE-21062
 URL: https://issues.apache.org/jira/browse/IGNITE-21062
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


In the scenario of creating a lot of table and having slow system (presumably), 
it's possible to notice {{Safe time reordering detected [current=...}} 
assertion error in logs.

It happens with safe-time sync commands, in the absence of transactional load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >