[ 
https://issues.apache.org/jira/browse/IGNITE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17777674#comment-17777674
 ] 

Aleksey Plekhanov edited comment on IGNITE-20697 at 10/20/23 9:48 AM:
----------------------------------------------------------------------

[~ktkale...@gridgain.com], sure, I have plans to create IEP and write to the 
dev list, but first I want to create POC.
{quote}It also turns out that if users do not gracefully shut down the cluster 
before switching to a new version of the ignite, they may experience problems 
starting nodes since there will be a new data recovery mechanism.
{quote}
I suppose we will provide both mechanisms and allow user to configure it. In 
the next release we can use physical records by defaul, in the following 
release we can swith default to checkpoint delta file. On recovery Ignite can 
decide what to do by analyzing files for current checkpoint. 


was (Author: alex_pl):
[~ktkale...@gridgain.com], sure, I have plans to create IEP and write to the 
dev list, but first I want to create POC.
{quote}It also turns out that if users do not gracefully shut down the cluster 
before switching to a new version of the ignite, they may experience problems 
starting nodes since there will be a new data recovery mechanism.
{quote}
I suppose we will provide both mechanisms and allow user to configure it. In 
the next release we can use physical records by defaul, in the following 
release we can swith default to checkpoint delta file. On recovery Ignite can 
decide what to do by analyzing files for corrent checkpoint. 

> Move physical records from WAL to another storage 
> --------------------------------------------------
>
>                 Key: IGNITE-20697
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20697
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Aleksey Plekhanov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>
> Currentrly, physycal records take most of the WAL size. But physical records 
> in WAL files required only for crush recovery and these records are useful 
> only for a short period of time (since last checkpoint). 
> Size of physical records during checkpoint is more than size of all modified 
> pages between checkpoints, since we need to store page snapshot record for 
> each modified page and page delta records, if page is modified more than once 
> between checkpoints.
> We process WAL file several times in stable workflow (without crashes and 
> rebalances):
>  # We write records to WAL files
>  # We copy WAL files to archive
>  # We compact WAL files (remove phisical records + compress)
> So, totally we write all physical records twice and read physical records at 
> least twice.
> To reduce disc workload we can move physical records to another storage and 
> don't write them to WAL files. To provide the same crush recovery guarantees 
> we can write modified pages twice during checkpoint. First time to some delta 
> file and second time to the page storage. In this case we can recover any 
> page if we crash during write to page storage from delta file (instead of 
> WAL, as we do now).
> This proposal has pros and cons.
> Pros:
>  - Less size of stored data (we don't store page delta files, only final 
> state of the page)
>  - Reduced disc workload (we store additionally write once all modified pages 
> instead of 2 writes and 2 reads of larger amount of data)
>  - Potentially reduced latancy (instead of writing physical records 
> synchronously during data modification we write to WAL only logical records 
> and physical pages will be written by checkpointer threads)
> Cons:
>  - Increased checkpoint duration (we should write doubled amount of data 
> during checkpoint)
> Let's try to implement it and benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to