[jira] [Commented] (IGNITE-11075) Index rebuild procedure over cache partition file

2020-07-07 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152634#comment-17152634
 ] 

Aleksey Plekhanov commented on IGNITE-11075:


Moved to the next release (together with the umbrella ticket)

> Index rebuild procedure over cache partition file
> -
>
> Key: IGNITE-11075
> URL: https://issues.apache.org/jira/browse/IGNITE-11075
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Maxim Muzafarov
>Assignee: Sergey Kalashnikov
>Priority: Major
>  Labels: iep-28
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The node can own partition when partition data is rebalanced and cache 
> indexes are ready. For the message-based cluster rebalancing, approach 
> indexes are rebuilding simultaneously with cache data loading. For the 
> file-based rebalancing approach, the index rebuild procedure must be finished 
> before the partition state is set to the OWNING state. 
> We need to rebuild local SQL indexes (the {{index.bin}} file) when partition 
> file has been received. Crash-recovery guarantees must be supported by a node 
> since index-rebuild performs on the node in the topology.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11075) Index rebuild procedure over cache partition file

2019-11-25 Thread Sergey Kalashnikov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981517#comment-16981517
 ] 

Sergey Kalashnikov commented on IGNITE-11075:
-

Implemented the following solution (PR 
https://github.com/apache/ignite/pull/7070):

Goals:
- Restart failed attempts to rebuild the indexes (due to node crash).
- Minimize the scope of recovery rebuilds to those caches and partitions that 
have not been able to complete the rebuild before the crash.
- Provide ability (and API) to rebuild arbitrarily selected partitions of cache 
indexes.

Design:

1) A set of partition rebuild markers is kept inside {{index.bin}} file (i.e. 
persisted).
For that purpose, the new {{"IndexRebuildMarkers"}} tree is introduced.

Item size for shared cache group: 6 bytes (4 for cacheId and 2 for partition)
Item size for single cache: 2 bytes (just partition)

So, for a node with 2000 local partitions: it takes 6(shared-group) or 
1(single-cache) additional page(s) per cache.
For an extreme case of 65500 local partitions per node: it takes 194 or 64 
pages per cache.
However, this tree is normally empty (only requires 1 page) and only takes 
space when the index rebuild is in progress.

2) Before the index rebuild start:
- Store the partition ids that will be rebuilt into {{index.bin}}.
- Log a new WAL record {{START_BUILD_INDEX_RECORD}} to protect the new 
information from the crash before the first checkpoint.

3) After successful completion of each partition rebuild:
- Remove the partition id from the {{"IndexRebuildMarkers"}}tree.

4) On memory recovery:
- If during logical records recovery we happen to meet 
{{START_BUILD_INDEX_RECORD}}, store partitions from the record into the 
{{index.bin}} unless the file was removed.

5) On cache start:
- Check if {{index.bin}} exists for a cache-group and then retrieve partition 
build markers from the {{"IndexRebuildMarkers"}} tree.
- Start index-rebuild for the marked partitions.

6) New API is provided for use by P2P rebalance:

{{public IgniteInternalFuture rebuildIndexesByPartition(CacheGroupContext 
grp, int partId);}}


> Index rebuild procedure over cache partition file
> -
>
> Key: IGNITE-11075
> URL: https://issues.apache.org/jira/browse/IGNITE-11075
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Maxim Muzafarov
>Assignee: Sergey Kalashnikov
>Priority: Major
>  Labels: iep-28
> Fix For: 2.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The node can own partition when partition data is rebalanced and cache 
> indexes are ready. For the message-based cluster rebalancing, approach 
> indexes are rebuilding simultaneously with cache data loading. For the 
> file-based rebalancing approach, the index rebuild procedure must be finished 
> before the partition state is set to the OWNING state. 
> We need to rebuild local SQL indexes (the {{index.bin}} file) when partition 
> file has been received. Crash-recovery guarantees must be supported by a node 
> since index-rebuild performs on the node in the topology.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)