Hi, hackers!

Here's new version of GiST VACUUM patch set aimed at CF 2018-09.
Original thread can be found at [0]

Currently GiST never reuse pages even if they are completely empty. This can 
lead to a bloat, if a lot of index tuples are deleted from key space, which do 
not receive newly inserted tuples.
First patch fixes that issue: empty pages are collected and reused for new page 

Second patch improves scanning algorithm to read GiST blocks in physical order. 
This can dramatically increase speed of scanning, especially on HDD.
This scan is using relatively big chunk of memory to build map of whole GiST 
graph. If there is not enough maintenance memory, patch had the fallback to old 
GiST VACUUM (from first patch).

==How logical scan works==
GiST VACUUM scans graph in DFS search to find removed tuples on leaf pages. It 
remembers internal pages that are referencing completely empty leaf pages.
Then in additional step, these pages are rescanned to delete references and 
mark leaf pages as free.

==How physical scan works==
Scan builds array of GistPSItem (Physical scan item). GistPSItem contains List 
of offsets pointing to potentially empty leaf pages and the information 
necessary to collect that list in one sequential file read.
Array of GistPSItems has one item for each GiST block.
When we encounter leaf page, if it is empty - we mark it empty and jump to 
parent (if known) to update it's list.
When we encounter internal page, we check GistPSItem of every child block to 
check if it is empty leaf and to mark parent ptr there.

At least one reference on each internal pages is left undeleted to preserve 
balancing of the tree.
Pages that has FOLLOW-RIGHT flag also are not deleted, even if empty.

Thank you for your attention, any thoughts are welcome.

Best regards, Andrey Borodin.


Attachment: 0001-Delete-pages-during-GiST-VACUUM-v3.patch
Description: Binary data

Attachment: 0002-Physical-GiST-scan-during-VACUUM-v3.patch
Description: Binary data

Reply via email to