[jira] [Commented] (IGNITE-20697) Move physical records from WAL to another storage

2023-10-20 Thread Kirill Tkalenko (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1697#comment-1697
 ] 

Kirill Tkalenko commented on IGNITE-20697:
--

[~alex_pl] Thanks for the clarification!

> Move physical records from WAL to another storage 
> --
>
> Key: IGNITE-20697
> URL: https://issues.apache.org/jira/browse/IGNITE-20697
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
>
> Currentrly, physycal records take most of the WAL size. But physical records 
> in WAL files required only for crush recovery and these records are useful 
> only for a short period of time (since last checkpoint). 
> Size of physical records during checkpoint is more than size of all modified 
> pages between checkpoints, since we need to store page snapshot record for 
> each modified page and page delta records, if page is modified more than once 
> between checkpoints.
> We process WAL file several times in stable workflow (without crashes and 
> rebalances):
>  # We write records to WAL files
>  # We copy WAL files to archive
>  # We compact WAL files (remove phisical records + compress)
> So, totally we write all physical records twice and read physical records at 
> least twice.
> To reduce disc workload we can move physical records to another storage and 
> don't write them to WAL files. To provide the same crush recovery guarantees 
> we can write modified pages twice during checkpoint. First time to some delta 
> file and second time to the page storage. In this case we can recover any 
> page if we crash during write to page storage from delta file (instead of 
> WAL, as we do now).
> This proposal has pros and cons.
> Pros:
>  - Less size of stored data (we don't store page delta files, only final 
> state of the page)
>  - Reduced disc workload (we store additionally write once all modified pages 
> instead of 2 writes and 2 reads of larger amount of data)
>  - Potentially reduced latancy (instead of writing physical records 
> synchronously during data modification we write to WAL only logical records 
> and physical pages will be written by checkpointer threads)
> Cons:
>  - Increased checkpoint duration (we should write doubled amount of data 
> during checkpoint)
> Let's try to implement it and benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20697) Move physical records from WAL to another storage

2023-10-20 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1674#comment-1674
 ] 

Aleksey Plekhanov commented on IGNITE-20697:


[~ktkale...@gridgain.com], sure, I have plans to create IEP and write to the 
dev list, but first I want to create POC.
{quote}It also turns out that if users do not gracefully shut down the cluster 
before switching to a new version of the ignite, they may experience problems 
starting nodes since there will be a new data recovery mechanism.
{quote}
I suppose we will provide both mechanisms and allow user to configure it. In 
the next release we can use physical records by defaul, in the following 
release we can swith default to checkpoint delta file. On recovery Ignite can 
decide what to do by analyzing files for corrent checkpoint. 

> Move physical records from WAL to another storage 
> --
>
> Key: IGNITE-20697
> URL: https://issues.apache.org/jira/browse/IGNITE-20697
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
>
> Currentrly, physycal records take most of the WAL size. But physical records 
> in WAL files required only for crush recovery and these records are useful 
> only for a short period of time (since last checkpoint). 
> Size of physical records during checkpoint is more than size of all modified 
> pages between checkpoints, since we need to store page snapshot record for 
> each modified page and page delta records, if page is modified more than once 
> between checkpoints.
> We process WAL file several times in stable workflow (without crashes and 
> rebalances):
>  # We write records to WAL files
>  # We copy WAL files to archive
>  # We compact WAL files (remove phisical records + compress)
> So, totally we write all physical records twice and read physical records at 
> least twice.
> To reduce disc workload we can move physical records to another storage and 
> don't write them to WAL files. To provide the same crush recovery guarantees 
> we can write modified pages twice during checkpoint. First time to some delta 
> file and second time to the page storage. In this case we can recover any 
> page if we crash during write to page storage from delta file (instead of 
> WAL, as we do now).
> This proposal has pros and cons.
> Pros:
>  - Less size of stored data (we don't store page delta files, only final 
> state of the page)
>  - Reduced disc workload (we store additionally write once all modified pages 
> instead of 2 writes and 2 reads of larger amount of data)
>  - Potentially reduced latancy (instead of writing physical records 
> synchronously during data modification we write to WAL only logical records 
> and physical pages will be written by checkpointer threads)
> Cons:
>  - Increased checkpoint duration (we should write doubled amount of data 
> during checkpoint)
> Let's try to implement it and benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20697) Move physical records from WAL to another storage

2023-10-20 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1625#comment-1625
 ] 

Ivan Bessonov commented on IGNITE-20697:


It is also worth mentioning that almost the same idea is already implemented in 
Ignite 3. The main difference is that we created a two-phase checkpoint:
 * Writing of the delta-file
 * Merging delta file to main partition file asynchronously

There are some optimizations left, but generally speaking, the implementation 
does work. You may take a look. Porting it as is will be tricky though. 

> Move physical records from WAL to another storage 
> --
>
> Key: IGNITE-20697
> URL: https://issues.apache.org/jira/browse/IGNITE-20697
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
>
> Currentrly, physycal records take most of the WAL size. But physical records 
> in WAL files required only for crush recovery and these records are useful 
> only for a short period of time (since last checkpoint). 
> Size of physical records during checkpoint is more than size of all modified 
> pages between checkpoints, since we need to store page snapshot record for 
> each modified page and page delta records, if page is modified more than once 
> between checkpoints.
> We process WAL file several times in stable workflow (without crashes and 
> rebalances):
>  # We write records to WAL files
>  # We copy WAL files to archive
>  # We compact WAL files (remove phisical records + compress)
> So, totally we write all physical records twice and read physical records at 
> least twice.
> To reduce disc workload we can move physical records to another storage and 
> don't write them to WAL files. To provide the same crush recovery guarantees 
> we can write modified pages twice during checkpoint. First time to some delta 
> file and second time to the page storage. In this case we can recover any 
> page if we crash during write to page storage from delta file (instead of 
> WAL, as we do now).
> This proposal has pros and cons.
> Pros:
>  - Less size of stored data (we don't store page delta files, only final 
> state of the page)
>  - Reduced disc workload (we store additionally write once all modified pages 
> instead of 2 writes and 2 reads of larger amount of data)
>  - Potentially reduced latancy (instead of writing physical records 
> synchronously during data modification we write to WAL only logical records 
> and physical pages will be written by checkpointer threads)
> Cons:
>  - Increased checkpoint duration (we should write doubled amount of data 
> during checkpoint)
> Let's try to implement it and benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-20697) Move physical records from WAL to another storage

2023-10-20 Thread Kirill Tkalenko (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1616#comment-1616
 ] 

Kirill Tkalenko commented on IGNITE-20697:
--

Hello [~alex_pl]! I think such a feature needs to be designed and implemented 
through an IEP, it looks like a serious and voluminous work.

It also turns out that if users do not gracefully shut down the cluster before 
switching to a new version of the ignite, they may experience problems starting 
nodes since there will be a new data recovery mechanism.

> Move physical records from WAL to another storage 
> --
>
> Key: IGNITE-20697
> URL: https://issues.apache.org/jira/browse/IGNITE-20697
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Aleksey Plekhanov
>Assignee: Aleksey Plekhanov
>Priority: Major
>
> Currentrly, physycal records take most of the WAL size. But physical records 
> in WAL files required only for crush recovery and these records are useful 
> only for a short period of time (since last checkpoint). 
> Size of physical records during checkpoint is more than size of all modified 
> pages between checkpoints, since we need to store page snapshot record for 
> each modified page and page delta records, if page is modified more than once 
> between checkpoints.
> We process WAL file several times in stable workflow (without crashes and 
> rebalances):
>  # We write records to WAL files
>  # We copy WAL files to archive
>  # We compact WAL files (remove phisical records + compress)
> So, totally we write all physical records twice and read physical records at 
> least twice.
> To reduce disc workload we can move physical records to another storage and 
> don't write them to WAL files. To provide the same crush recovery guarantees 
> we can write modified pages twice during checkpoint. First time to some delta 
> file and second time to the page storage. In this case we can recover any 
> page if we crash during write to page storage from delta file (instead of 
> WAL, as we do now).
> This proposal has pros and cons.
> Pros:
>  - Less size of stored data (we don't store page delta files, only final 
> state of the page)
>  - Reduced disc workload (we store additionally write once all modified pages 
> instead of 2 writes and 2 reads of larger amount of data)
>  - Potentially reduced latancy (instead of writing physical records 
> synchronously during data modification we write to WAL only logical records 
> and physical pages will be written by checkpointer threads)
> Cons:
>  - Increased checkpoint duration (we should write doubled amount of data 
> during checkpoint)
> Let's try to implement it and benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)