[jira] [Comment Edited] (IGNITE-18595) Implement index build process during the full state transfer
[ https://issues.apache.org/jira/browse/IGNITE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816507#comment-17816507 ] Kirill Tkalenko edited comment on IGNITE-18595 at 2/12/24 7:42 AM: --- For simplicity, in this ticket we will not deal with dropping an index during a full state transfer. Abbreviations: * RO - read-only * AT - ActivationTimestamp index status * ACV - ActiveCatalogVersion Approximate and schematic algorithm for inserting writes: # Collect all indexes (REGISTERED, BUILDING, AVAILABLE and STOPPING) on the catalog version of the transferred snapshot, as well as all remote AVAILABLE indexes up to the snapshot catalog version. # Select indexes by write type: ## WriteCommited: *** If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take them right away. *** For the RO index, we take it only if commitTs < AT(STOPPING). ## WriteIntent: *** If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take them right away. *** For an index with status REGISTERED, we take it only if ACV(beginTs) == REGISTERED. *** For the RO index, we take it only if beginTs < AT(STOPPING). was (Author: ktkale...@gridgain.com): For simplicity, in this ticket we will not deal with dropping an index during a full state transfer. Abbreviations: * RO - read-only * AT - ActivationTimestamp index status * ACV - ActiveCatalogVersion Approximate and schematic algorithm for inserting writes: # Collect all indexes (REGISTERED, BUILDING, AVAILABLE and STOPPING) on the catalog version of the transferred snapshot, as well as all remote AVAILABLE indexes up to the snapshot catalog version. # Select indexes by write type: ## WriteCommited: * If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take them right away. * For the RO index, we take it only if commitTs < AT(STOPPING). ## WriteIntent: * If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take them right away. * For an index with status REGISTERED, we take it only if ACV(beginTs) == REGISTERED. * For the RO index, we take it only if beginTs < AT(STOPPING). > Implement index build process during the full state transfer > > > Key: IGNITE-18595 > URL: https://issues.apache.org/jira/browse/IGNITE-18595 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Kirill Tkalenko >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Before starting to accept tuples during a full state transfer, we should take > the list of all the indices of the table in question that are in states > between REGISTERED and READ_ONLY at the Catalog version passed with the full > state transfer. Let’s put them in the *CurrentIndices* list. > Then, for each tuple version we accept: > # If it’s committed, only consider indices from *CurrentIndices* that are > not in the REGISTERED state now. We don’t need index committed versions for > REGISTERED indices as they will be indexed by the backfiller (after the index > switches to BACKFILLING). For each remaining index in {*}CurrentIndices{*}, > put the tuple version to the index if one of the following is true: > ## The index state is not READ_ONLY at the snapshot catalog version (so it’s > one of BACKFILLING, AVAILABLE, STOPPING) - because these tuples can still be > read by both RW and RO transactions via the index > ## The index state is READ_ONLY at the snapshot catalog version, but at > commitTs it either did not yet exist, or strictly preceded STOPPING (we don’t > include tuples committed on STOPPING as, from the point of view of RO > transactions, it’s impossible to query such tuples via the index [it is not > queryable at those timestamps], new RW transactions don’t see the index, and > old RW transactions [that saw it] have already finished) > # If it’s a Write Intent, then: > ## If the index is in the REGISTERED state at the snapshot catalog version, > add the tuple to the index if its transaction was started in the REGISTERED > state of the index; otherwise, skip it as it will be indexed by the > backfiller. > ## If the index is in any of BACKFILLING, AVAILABLE, STOPPING states at the > snapshot catalog version, add the tuple to the index > ## If the index is in READ_ONLY state at the snapshot catalog version, add > the tuple to the index only if the transaction had been started before the > index switched from the AVAILABLE state (this is to index a write intent from > a finished, but not yet cleaned up, transaction) > Unlike the Backfiller operation, during a full state transfer, we don’t need > to use the Write Intent resolution procedure as races with transaction > cleanup are not possible, we just index a Write Intent; If,
[jira] [Commented] (IGNITE-18595) Implement index build process during the full state transfer
[ https://issues.apache.org/jira/browse/IGNITE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816507#comment-17816507 ] Kirill Tkalenko commented on IGNITE-18595: -- For simplicity, in this ticket we will not deal with dropping an index during a full state transfer. Abbreviations: * RO - read-only * AT - ActivationTimestamp index status * ACV - ActiveCatalogVersion Approximate and schematic algorithm for inserting writes: # Collect all indexes (REGISTERED, BUILDING, AVAILABLE and STOPPING) on the catalog version of the transferred snapshot, as well as all remote AVAILABLE indexes up to the snapshot catalog version. # Select indexes by write type: ## WriteCommited: * If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take them right away. * For the RO index, we take it only if commitTs < AT(STOPPING). ## WriteIntent: * If the indexes are in statuses (BUILDING, AVAILABLE and STOPPING), we take them right away. * For an index with status REGISTERED, we take it only if ACV(beginTs) == REGISTERED. * For the RO index, we take it only if beginTs < AT(STOPPING). > Implement index build process during the full state transfer > > > Key: IGNITE-18595 > URL: https://issues.apache.org/jira/browse/IGNITE-18595 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Kirill Tkalenko >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Before starting to accept tuples during a full state transfer, we should take > the list of all the indices of the table in question that are in states > between REGISTERED and READ_ONLY at the Catalog version passed with the full > state transfer. Let’s put them in the *CurrentIndices* list. > Then, for each tuple version we accept: > # If it’s committed, only consider indices from *CurrentIndices* that are > not in the REGISTERED state now. We don’t need index committed versions for > REGISTERED indices as they will be indexed by the backfiller (after the index > switches to BACKFILLING). For each remaining index in {*}CurrentIndices{*}, > put the tuple version to the index if one of the following is true: > ## The index state is not READ_ONLY at the snapshot catalog version (so it’s > one of BACKFILLING, AVAILABLE, STOPPING) - because these tuples can still be > read by both RW and RO transactions via the index > ## The index state is READ_ONLY at the snapshot catalog version, but at > commitTs it either did not yet exist, or strictly preceded STOPPING (we don’t > include tuples committed on STOPPING as, from the point of view of RO > transactions, it’s impossible to query such tuples via the index [it is not > queryable at those timestamps], new RW transactions don’t see the index, and > old RW transactions [that saw it] have already finished) > # If it’s a Write Intent, then: > ## If the index is in the REGISTERED state at the snapshot catalog version, > add the tuple to the index if its transaction was started in the REGISTERED > state of the index; otherwise, skip it as it will be indexed by the > backfiller. > ## If the index is in any of BACKFILLING, AVAILABLE, STOPPING states at the > snapshot catalog version, add the tuple to the index > ## If the index is in READ_ONLY state at the snapshot catalog version, add > the tuple to the index only if the transaction had been started before the > index switched from the AVAILABLE state (this is to index a write intent from > a finished, but not yet cleaned up, transaction) > Unlike the Backfiller operation, during a full state transfer, we don’t need > to use the Write Intent resolution procedure as races with transaction > cleanup are not possible, we just index a Write Intent; If, after the > partition replica goes online, it gets a cleanup request with ABORT, it will > clean the index itself. > If the initial state of an index during the full state transfer was > BACKFILLING and, during accepting the full state transfer, we saw that the > index was dropped (and moved to the [deleted] pseudostate), we should stop > writing to that index (and allow it be destroyed on that partition). > If we start a full state transfer on a partition for which an index is being > built (so the index is in the BACKFILLING state): we’ll index the accepted > tuples (according to the rules above). After the full state transfer > finishes, we’ll start getting ‘add this batch to the index’ commands from the > RAFT log (as the Backfiller emits them during the backfilling process), we > can just ignore or reapply them. To ignore them, we can raise a special flag > in the index storage when finishing a full state transfer started with the > index being in BACKFILLING state. > h1. Old version > Here there
[jira] [Commented] (IGNITE-21181) Failure to resolve a primary replica after stopping a node
[ https://issues.apache.org/jira/browse/IGNITE-21181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816499#comment-17816499 ] Alexander Lapin commented on IGNITE-21181: -- [~Denis Chudov] LGTM, Thanks! > Failure to resolve a primary replica after stopping a node > -- > > Key: IGNITE-21181 > URL: https://issues.apache.org/jira/browse/IGNITE-21181 > Project: Ignite > Issue Type: Bug >Reporter: Roman Puchkovskiy >Assignee: Denis Chudov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 2h > Remaining Estimate: 0h > > The scenario is that the cluster consists of 3 nodes (0, 1, 2). Primary > replica of the sole partition is on node 0. Then node 0 is stopped and an > attempt to do a put via node 2 is done. The partition still has majority, but > the put results in the following: > > {code:java} > org.apache.ignite.tx.TransactionException: IGN-REP-5 > TraceId:55c59c96-17d1-4efc-8e3c-cca81b8b41ad Failed to resolve the primary > replica node [consistentId=itrst_ncisasiti_0] > > at > org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlist$69(InternalTableImpl.java:1749) > at > java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) > at > java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:946) > at > java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2266) > at > org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.enlist(InternalTableImpl.java:1739) > at > org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.enlistWithRetry(InternalTableImpl.java:480) > at > org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.enlistInTx(InternalTableImpl.java:301) > at > org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.upsert(InternalTableImpl.java:965) > at > org.apache.ignite.internal.table.KeyValueViewImpl.lambda$putAsync$10(KeyValueViewImpl.java:196) > at > org.apache.ignite.internal.table.AbstractTableView.lambda$withSchemaSync$1(AbstractTableView.java:111) > at > java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106) > at > java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235) > at > org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:111) > at > org.apache.ignite.internal.table.AbstractTableView.withSchemaSync(AbstractTableView.java:102) > at > org.apache.ignite.internal.table.KeyValueViewImpl.putAsync(KeyValueViewImpl.java:193) > at > org.apache.ignite.internal.table.KeyValueViewImpl.put(KeyValueViewImpl.java:185) > at > org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.putToNode(ItTableRaftSnapshotsTest.java:257) > at > org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.putToNode(ItTableRaftSnapshotsTest.java:253) > at > org.apache.ignite.internal.raftsnapshot.ItTableRaftSnapshotsTest.nodeCanInstallSnapshotsAfterSnapshotInstalledToIt(ItTableRaftSnapshotsTest.java:473){code} > > This can be reproduced using > ItTableRaftSnapshotsTest#nodeCanInstallSnapshotsAfterSnapshotInstalledToIt(). > The reason is that, according to the test, the leader of partition group is > transferred on node 0, which means that this node most probably will be > selected as primary, and after that the node 0 is stopped, and then the > transaction is started. Node 0 is still a leaseholder in the current time > interval, but it's already left the topology. > We can fix the test to make it await the new primary, which would be present > in the cluster, or make the restries on the very first transactional request. > In the case of latter, we need to ensure that the request is actually first > and single, no other request in any parallel thread was sent, otherwise we > cant retry the request on another primary . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-21316) Add manual schema sync to ItRebalanceDistributedTest
[ https://issues.apache.org/jira/browse/IGNITE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816492#comment-17816492 ] Roman Puchkovskiy commented on IGNITE-21316: Thanks! > Add manual schema sync to ItRebalanceDistributedTest > > > Key: IGNITE-21316 > URL: https://issues.apache.org/jira/browse/IGNITE-21316 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > ItRebalanceDistributedTest creates tables by going directly to the > CatalogManager, so it skips schema sync logic. This has to be fixed. > Also, integration tests are to be written to demonstrate that for DDLs > executed via SQL API, schema sync is executed correctly. > h3. Old version of the description follows, but it's incorrect: schema sync > is already made for every SQL query, including DDL ones > -When executing an DDL operation on one node and then executing another > operation (depending on the first operation to finish) on another node, the > second operation might not see the first operation results.- > -For example, if we create a zone via node A, wait for the DDL future to > complete and then we try to create a table using that new zone via node B, > the table creation might fail because node B does not see the newly-created > zone yet.- > -This is because the zone creation future only makes us wait for all > activation timestamp to become non-future on all clocks on the cluster, but > when this happens, there is no guarantee that all nodes actually received the > new catalog version.- > -To fix this, we need to do a schema sync for timestamp equal to 'now' before > doing any DDL operation.- > -This should probably be done in the DDL handler (but maybe it makes sense to > do it in the `execute()` method of the CatalogManager).- > -An example of a test demonstrating the problem is > ItRebalanceDistributedTest.testOnLeaderElectedRebalanceRestart(). But this > test also has another problem: it interacts with the CatalogManager directly. > If we add the fix above the CatalogManager, the test will have to be fixed to > do schema sync by hand.- -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-21316) Add manual schema sync to ItRebalanceDistributedTest
[ https://issues.apache.org/jira/browse/IGNITE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816491#comment-17816491 ] Kirill Tkalenko commented on IGNITE-21316: -- Looks good. > Add manual schema sync to ItRebalanceDistributedTest > > > Key: IGNITE-21316 > URL: https://issues.apache.org/jira/browse/IGNITE-21316 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > ItRebalanceDistributedTest creates tables by going directly to the > CatalogManager, so it skips schema sync logic. This has to be fixed. > Also, integration tests are to be written to demonstrate that for DDLs > executed via SQL API, schema sync is executed correctly. > h3. Old version of the description follows, but it's incorrect: schema sync > is already made for every SQL query, including DDL ones > -When executing an DDL operation on one node and then executing another > operation (depending on the first operation to finish) on another node, the > second operation might not see the first operation results.- > -For example, if we create a zone via node A, wait for the DDL future to > complete and then we try to create a table using that new zone via node B, > the table creation might fail because node B does not see the newly-created > zone yet.- > -This is because the zone creation future only makes us wait for all > activation timestamp to become non-future on all clocks on the cluster, but > when this happens, there is no guarantee that all nodes actually received the > new catalog version.- > -To fix this, we need to do a schema sync for timestamp equal to 'now' before > doing any DDL operation.- > -This should probably be done in the DDL handler (but maybe it makes sense to > do it in the `execute()` method of the CatalogManager).- > -An example of a test demonstrating the problem is > ItRebalanceDistributedTest.testOnLeaderElectedRebalanceRestart(). But this > test also has another problem: it interacts with the CatalogManager directly. > If we add the fix above the CatalogManager, the test will have to be fixed to > do schema sync by hand.- -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21509) Make MaxClockSkew configurable
[ https://issues.apache.org/jira/browse/IGNITE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Puchkovskiy updated IGNITE-21509: --- Summary: Make MaxClockSkew configurable (was: Move MaxClockSkew to the cluster configuration) > Make MaxClockSkew configurable > -- > > Key: IGNITE-21509 > URL: https://issues.apache.org/jira/browse/IGNITE-21509 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, MaxClockSkew is hardcoded in HybridTimestamp. It should be > configurable by the user, so we need to create a distributed cluster > configuration entry for it. > It needs to be immutable (at least for now, for simplicity); also, its value > must not exceed delayDuration (optimally, MaxClockSkew should be less than > delayDuration, maybe half of it or smaller). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-20641) Entries added via data streamer to persistent cache are not written to cache dump
[ https://issues.apache.org/jira/browse/IGNITE-20641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Korotkov resolved IGNITE-20641. -- Fix Version/s: 2.17 Resolution: Duplicate > Entries added via data streamer to persistent cache are not written to cache > dump > - > > Key: IGNITE-20641 > URL: https://issues.apache.org/jira/browse/IGNITE-20641 > Project: Ignite > Issue Type: Bug >Reporter: Sergey Korotkov >Priority: Minor > Labels: IEP-109, ise > Fix For: 2.17 > > > Steps to reproduce the problem: > * start ignite with persistence > * load some entries via the data streamer > * restart ignite > * create cache dump > * check cache dump consistency > Consistency check would fail with errors like > {noformat} > [2023-10-11T12:13:28,711][INFO ][session=427e7c47][CommandHandlerLog] Hash > conflicts: > [2023-10-11T12:13:28,721][INFO ][session=427e7c47][CommandHandlerLog] > Conflict partition: PartitionKeyV2 [grpId=-1988013461, grpName=test-cache-1, > partId=947] > [2023-10-11T12:13:28,725][INFO ][session=427e7c47][CommandHandlerLog] > Partition instances: [PartitionHashRecordV2 [isPrimary=false, > consistentId=ducker03, updateCntr=null, partitionState=OWNING, size=0, > partHash=0, partVerHash=0], PartitionHashRecordV2 [isPrimary=false, > consistentId=ducker02, updateCntr=null, partitionState=OWNING, size=48, > partHash=731883010, partVerHash=0]] > {noformat} > *.dump files on primary are empty, but on backups are not. > --- > Reason is that after ignite restart such records are always considered to be > added after dump creation start in CreateDumpFutureTask::isAfterStart. That > is because entries added via the datastreamer have version equal to > isolatedStreamerVer but isolatedStreamerVer changes on each ignite restart > and isolatedStreamerVer is always greater than startVer. -- This message was sent by Atlassian Jira (v8.20.10#820010)