[jira] [Comment Edited] (IGNITE-18595) Implement index build process during the full state transfer

2024-02-11 Thread Kirill Tkalenko (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816507#comment-17816507
 ] 

Kirill Tkalenko edited comment on IGNITE-18595 at 2/12/24 7:42 AM:
---

For simplicity, in this ticket we will not deal with dropping an index during a 
full state transfer.

Abbreviations:
* RO - read-only
* AT - ActivationTimestamp index status
* ACV - ActiveCatalogVersion

Approximate and schematic algorithm for inserting writes:
# Collect all indexes (REGISTERED, BUILDING, AVAILABLE and STOPPING) on the 
catalog version of the transferred snapshot, as well as all remote AVAILABLE 
indexes up to the snapshot catalog version.
# Select indexes by write type:
## WriteCommited:
*** If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take 
them right away.
*** For the RO index, we take it only if commitTs < AT(STOPPING).
## WriteIntent:
*** If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take 
them right away.
*** For an index with status REGISTERED, we take it only if ACV(beginTs) == 
REGISTERED.
*** For the RO index, we take it only if beginTs < AT(STOPPING).


was (Author: ktkale...@gridgain.com):
For simplicity, in this ticket we will not deal with dropping an index during a 
full state transfer.

Abbreviations:
* RO - read-only
* AT - ActivationTimestamp index status
* ACV - ActiveCatalogVersion

Approximate and schematic algorithm for inserting writes:
# Collect all indexes (REGISTERED, BUILDING, AVAILABLE and STOPPING) on the 
catalog version of the transferred snapshot, as well as all remote AVAILABLE 
indexes up to the snapshot catalog version.
# Select indexes by write type:
## WriteCommited:
* If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take 
them right away.
* For the RO index, we take it only if commitTs < AT(STOPPING).
## WriteIntent:
* If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take 
them right away.
* For an index with status REGISTERED, we take it only if ACV(beginTs) == 
REGISTERED.
* For the RO index, we take it only if beginTs < AT(STOPPING).

> Implement index build process during the full state transfer
> 
>
> Key: IGNITE-18595
> URL: https://issues.apache.org/jira/browse/IGNITE-18595
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> Before starting to accept tuples during a full state transfer, we should take 
> the list of all the indices of the table in question that are in states 
> between REGISTERED and READ_ONLY at the Catalog version passed with the full 
> state transfer. Let’s put them in the *CurrentIndices* list.
> Then, for each tuple version we accept:
>  # If it’s committed, only consider indices from *CurrentIndices* that are 
> not in the REGISTERED state now. We don’t need index committed versions for 
> REGISTERED indices as they will be indexed by the backfiller (after the index 
> switches to BACKFILLING). For each remaining index in {*}CurrentIndices{*}, 
> put the tuple version to the index if one of the following is true:
>  ## The index state is not READ_ONLY at the snapshot catalog version (so it’s 
> one of BACKFILLING, AVAILABLE, STOPPING) - because these tuples can still be 
> read by both RW and RO transactions via the index
>  ## The index state is READ_ONLY at the snapshot catalog version, but at 
> commitTs it either did not yet exist, or strictly preceded STOPPING (we don’t 
> include tuples committed on STOPPING as, from the point of view of RO 
> transactions, it’s impossible to query such tuples via the index [it is not 
> queryable at those timestamps], new RW transactions don’t see the index, and 
> old RW transactions [that saw it] have already finished)
>  # If it’s a Write Intent, then:
>  ## If the index is in the REGISTERED state at the snapshot catalog version, 
> add the tuple to the index if its transaction was started in the REGISTERED 
> state of the index; otherwise, skip it as it will be indexed by the 
> backfiller.
>  ## If the index is in any of BACKFILLING, AVAILABLE, STOPPING states at the 
> snapshot catalog version, add the tuple to the index
>  ## If the index is in READ_ONLY state at the snapshot catalog version, add 
> the tuple to the index only if the transaction had been started before the 
> index switched from the AVAILABLE state (this is to index a write intent from 
> a finished, but not yet cleaned up, transaction)
> Unlike the Backfiller operation, during a full state transfer, we don’t need 
> to use the Write Intent resolution procedure as races with transaction 
> cleanup are not possible, we just index a Write Intent; If, 

[jira] [Commented] (IGNITE-18595) Implement index build process during the full state transfer

2024-02-11 Thread Kirill Tkalenko (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816507#comment-17816507
 ] 

Kirill Tkalenko commented on IGNITE-18595:
--

For simplicity, in this ticket we will not deal with dropping an index during a 
full state transfer.

Abbreviations:
* RO - read-only
* AT - ActivationTimestamp index status
* ACV - ActiveCatalogVersion

Approximate and schematic algorithm for inserting writes:
# Collect all indexes (REGISTERED, BUILDING, AVAILABLE and STOPPING) on the 
catalog version of the transferred snapshot, as well as all remote AVAILABLE 
indexes up to the snapshot catalog version.
# Select indexes by write type:
## WriteCommited:
* If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take 
them right away.
* For the RO index, we take it only if commitTs < AT(STOPPING).
## WriteIntent:
* If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take 
them right away.
* For an index with status REGISTERED, we take it only if ACV(beginTs) == 
REGISTERED.
* For the RO index, we take it only if beginTs < AT(STOPPING).

> Implement index build process during the full state transfer
> 
>
> Key: IGNITE-18595
> URL: https://issues.apache.org/jira/browse/IGNITE-18595
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Kirill Tkalenko
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> Before starting to accept tuples during a full state transfer, we should take 
> the list of all the indices of the table in question that are in states 
> between REGISTERED and READ_ONLY at the Catalog version passed with the full 
> state transfer. Let’s put them in the *CurrentIndices* list.
> Then, for each tuple version we accept:
>  # If it’s committed, only consider indices from *CurrentIndices* that are 
> not in the REGISTERED state now. We don’t need index committed versions for 
> REGISTERED indices as they will be indexed by the backfiller (after the index 
> switches to BACKFILLING). For each remaining index in {*}CurrentIndices{*}, 
> put the tuple version to the index if one of the following is true:
>  ## The index state is not READ_ONLY at the snapshot catalog version (so it’s 
> one of BACKFILLING, AVAILABLE, STOPPING) - because these tuples can still be 
> read by both RW and RO transactions via the index
>  ## The index state is READ_ONLY at the snapshot catalog version, but at 
> commitTs it either did not yet exist, or strictly preceded STOPPING (we don’t 
> include tuples committed on STOPPING as, from the point of view of RO 
> transactions, it’s impossible to query such tuples via the index [it is not 
> queryable at those timestamps], new RW transactions don’t see the index, and 
> old RW transactions [that saw it] have already finished)
>  # If it’s a Write Intent, then:
>  ## If the index is in the REGISTERED state at the snapshot catalog version, 
> add the tuple to the index if its transaction was started in the REGISTERED 
> state of the index; otherwise, skip it as it will be indexed by the 
> backfiller.
>  ## If the index is in any of BACKFILLING, AVAILABLE, STOPPING states at the 
> snapshot catalog version, add the tuple to the index
>  ## If the index is in READ_ONLY state at the snapshot catalog version, add 
> the tuple to the index only if the transaction had been started before the 
> index switched from the AVAILABLE state (this is to index a write intent from 
> a finished, but not yet cleaned up, transaction)
> Unlike the Backfiller operation, during a full state transfer, we don’t need 
> to use the Write Intent resolution procedure as races with transaction 
> cleanup are not possible, we just index a Write Intent; If, after the 
> partition replica goes online, it gets a cleanup request with ABORT, it will 
> clean the index itself.
> If the initial state of an index during the full state transfer was 
> BACKFILLING and, during accepting the full state transfer, we saw that the 
> index was dropped (and moved to the [deleted] pseudostate), we should stop 
> writing to that index (and allow it be destroyed on that partition).
> If we start a full state transfer on a partition for which an index is being 
> built (so the index is in the BACKFILLING state): we’ll index the accepted 
> tuples (according to the rules above). After the full state transfer 
> finishes, we’ll start getting ‘add this batch to the index’ commands from the 
> RAFT log (as the Backfiller emits them during the backfilling process), we 
> can just ignore or reapply them. To ignore them, we can raise a special flag 
> in the index storage when finishing a full state transfer started with the 
> index being in BACKFILLING state.
> h1. Old version
> Here there 

[jira] [Commented] (IGNITE-21181) Failure to resolve a primary replica after stopping a node

2024-02-11 Thread Alexander Lapin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816499#comment-17816499
 ] 

Alexander Lapin commented on IGNITE-21181:
--

[~Denis Chudov] LGTM, Thanks!

> Failure to resolve a primary replica after stopping a node
> --
>
> Key: IGNITE-21181
> URL: https://issues.apache.org/jira/browse/IGNITE-21181
> Project: Ignite
>  Issue Type: Bug
>Reporter: Roman Puchkovskiy
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The scenario is that the cluster consists of 3 nodes (0, 1, 2). Primary 
> replica of the sole partition is on node 0. Then node 0 is stopped and an 
> attempt to do a put via node 2 is done. The partition still has majority, but 
> the put results in the following:
>  
> {code:java}
> org.apache.ignite.tx.TransactionException: IGN-REP-5 
> TraceId:55c59c96-17d1-4efc-8e3c-cca81b8b41ad Failed to resolve the primary 
> replica node [consistentId=itrst_ncisasiti_0]
>  
> at 
> org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlist$69(InternalTableImpl.java:1749)
> at 
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
> at 
> java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:946)
> at 
> java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2266)
> at 
> org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.enlist(InternalTableImpl.java:1739)
> at 
> org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.enlistWithRetry(InternalTableImpl.java:480)
> at 
> org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.enlistInTx(InternalTableImpl.java:301)
> at 
> org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.upsert(InternalTableImpl.java:965)
> at 
> org.apache.ignite.internal.table.KeyValueViewImpl.lambda$putAsync$10(KeyValueViewImpl.java:196)
> at 
> org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$1(AbstractTableView.java:111)
> at 
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
> at 
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
> at 
> org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:111)
> at 
> org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:102)
> at 
> org.apache.ignite.internal.table.KeyValueViewImpl.putAsync(KeyValueViewImpl.java:193)
> at 
> org.apache.ignite.internal.table.KeyValueViewImpl.put(KeyValueViewImpl.java:185)
> at 
> org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.putToNode(ItTableRaftSnapshotsTest.java:257)
> at 
> org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.putToNode(ItTableRaftSnapshotsTest.java:253)
> at 
> org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.nodeCanInstallSnapshotsAfterSnapshotInstalledToIt(ItTableRaftSnapshotsTest.java:473){code}
>  
> This can be reproduced using 
> ItTableRaftSnapshotsTest#nodeCanInstallSnapshotsAfterSnapshotInstalledToIt().
> The reason is that, according to the test, the leader of partition group is 
> transferred on node 0, which means that this node most probably will be 
> selected as primary, and after that the node 0 is stopped, and then the 
> transaction is started. Node 0 is still a leaseholder in the current time 
> interval, but it's already left the topology.
> We can fix the test to make it await the new primary, which would be present 
> in the cluster, or make the restries on the very first transactional request. 
> In the case of latter, we need to ensure that the request is actually first 
> and single, no other request in any parallel thread was sent, otherwise we 
> cant retry the request on another primary .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21316) Add manual schema sync to ItRebalanceDistributedTest

2024-02-11 Thread Roman Puchkovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816492#comment-17816492
 ] 

Roman Puchkovskiy commented on IGNITE-21316:


Thanks!

> Add manual schema sync to ItRebalanceDistributedTest
> 
>
> Key: IGNITE-21316
> URL: https://issues.apache.org/jira/browse/IGNITE-21316
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ItRebalanceDistributedTest creates tables by going directly to the 
> CatalogManager, so it skips schema sync logic. This has to be fixed.
> Also, integration tests are to be written to demonstrate that for DDLs 
> executed via SQL API, schema sync is executed correctly.
> h3. Old version of the description follows, but it's incorrect: schema sync 
> is already made for every SQL query, including DDL ones
> -When executing an DDL operation on one node and then executing another 
> operation (depending on the first operation to finish) on another node, the 
> second operation might not see the first operation results.-
> -For example, if we create a zone via node A, wait for the DDL future to 
> complete and then we try to create a table using that new zone via node B, 
> the table creation might fail because node B does not see the newly-created 
> zone yet.-
> -This is because the zone creation future only makes us wait for all 
> activation timestamp to become non-future on all clocks on the cluster, but 
> when this happens, there is no guarantee that all nodes actually received the 
> new catalog version.-
> -To fix this, we need to do a schema sync for timestamp equal to 'now' before 
> doing any DDL operation.-
> -This should probably be done in the DDL handler (but maybe it makes sense to 
> do it in the `execute()` method of the CatalogManager).-
> -An example of a test demonstrating the problem is 
> ItRebalanceDistributedTest.testOnLeaderElectedRebalanceRestart(). But this 
> test also has another problem: it interacts with the CatalogManager directly. 
> If we add the fix above the CatalogManager, the test will have to be fixed to 
> do schema sync by hand.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-21316) Add manual schema sync to ItRebalanceDistributedTest

2024-02-11 Thread Kirill Tkalenko (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816491#comment-17816491
 ] 

Kirill Tkalenko commented on IGNITE-21316:
--

Looks good.

> Add manual schema sync to ItRebalanceDistributedTest
> 
>
> Key: IGNITE-21316
> URL: https://issues.apache.org/jira/browse/IGNITE-21316
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ItRebalanceDistributedTest creates tables by going directly to the 
> CatalogManager, so it skips schema sync logic. This has to be fixed.
> Also, integration tests are to be written to demonstrate that for DDLs 
> executed via SQL API, schema sync is executed correctly.
> h3. Old version of the description follows, but it's incorrect: schema sync 
> is already made for every SQL query, including DDL ones
> -When executing an DDL operation on one node and then executing another 
> operation (depending on the first operation to finish) on another node, the 
> second operation might not see the first operation results.-
> -For example, if we create a zone via node A, wait for the DDL future to 
> complete and then we try to create a table using that new zone via node B, 
> the table creation might fail because node B does not see the newly-created 
> zone yet.-
> -This is because the zone creation future only makes us wait for all 
> activation timestamp to become non-future on all clocks on the cluster, but 
> when this happens, there is no guarantee that all nodes actually received the 
> new catalog version.-
> -To fix this, we need to do a schema sync for timestamp equal to 'now' before 
> doing any DDL operation.-
> -This should probably be done in the DDL handler (but maybe it makes sense to 
> do it in the `execute()` method of the CatalogManager).-
> -An example of a test demonstrating the problem is 
> ItRebalanceDistributedTest.testOnLeaderElectedRebalanceRestart(). But this 
> test also has another problem: it interacts with the CatalogManager directly. 
> If we add the fix above the CatalogManager, the test will have to be fixed to 
> do schema sync by hand.-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21509) Make MaxClockSkew configurable

2024-02-11 Thread Roman Puchkovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-21509:
---
Summary: Make MaxClockSkew configurable  (was: Move MaxClockSkew to the 
cluster configuration)

> Make MaxClockSkew configurable
> --
>
> Key: IGNITE-21509
> URL: https://issues.apache.org/jira/browse/IGNITE-21509
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> Currently, MaxClockSkew is hardcoded in HybridTimestamp. It should be 
> configurable by the user, so we need to create a distributed cluster 
> configuration entry for it.
> It needs to be immutable (at least for now, for simplicity); also, its value 
> must not exceed delayDuration (optimally, MaxClockSkew should be less than 
> delayDuration, maybe half of it or smaller).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-20641) Entries added via data streamer to persistent cache are not written to cache dump

2024-02-11 Thread Sergey Korotkov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Korotkov resolved IGNITE-20641.
--
Fix Version/s: 2.17
   Resolution: Duplicate

> Entries added via data streamer to persistent cache are not written to cache 
> dump
> -
>
> Key: IGNITE-20641
> URL: https://issues.apache.org/jira/browse/IGNITE-20641
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Korotkov
>Priority: Minor
>  Labels: IEP-109, ise
> Fix For: 2.17
>
>
> Steps to reproduce the problem:
>  * start ignite with persistence
>  * load some entries via the data streamer
>  * restart ignite
>  * create cache dump
>  * check cache dump consistency
> Consistency check would fail with errors like
> {noformat}
> [2023-10-11T12:13:28,711][INFO ][session=427e7c47][CommandHandlerLog] Hash 
> conflicts:
> [2023-10-11T12:13:28,721][INFO ][session=427e7c47][CommandHandlerLog] 
> Conflict partition: PartitionKeyV2 [grpId=-1988013461, grpName=test-cache-1, 
> partId=947]
> [2023-10-11T12:13:28,725][INFO ][session=427e7c47][CommandHandlerLog] 
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=ducker03, updateCntr=null, partitionState=OWNING, size=0, 
> partHash=0, partVerHash=0], PartitionHashRecordV2 [isPrimary=false, 
> consistentId=ducker02, updateCntr=null, partitionState=OWNING, size=48, 
> partHash=731883010, partVerHash=0]]
> {noformat}
> *.dump files on primary are empty, but on backups are not.
> ---
> Reason is that after ignite restart such records are always considered to be 
> added after dump creation start in CreateDumpFutureTask::isAfterStart. That 
> is because entries added via the datastreamer have version equal to 
> isolatedStreamerVer but isolatedStreamerVer changes on each ignite restart 
> and isolatedStreamerVer is always greater than startVer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)