Hi hackers,

Together with Kirk and Nik we spent several online hacking sessions to tackle 
index bloat problems [0,1,2]. As a result we concluded that B-tree index page 
merge should help to keep an index from pressuring shared_buffers.

We are proposing a feature to automatically merge nearly-empty B-tree leaf 
pages during VACUUM operations to reduce index bloat. This addresses a common 
issue where deleted tuples leave behind sparsely populated pages that 
traditional page deletion cannot handle because they're not completely empty.

*** Implementation Overview:

The patch adds a new `mergefactor` reloption (default 5%, range 0-50%) that 
defines when a page becomes a candidate for merging. During vacuum, when a page 
exceeds this threshold (e.g., 95% empty with default settings), we attempt to 
move the remaining tuples to the right sibling page and then delete the source 
page using existing page deletion mechanisms.

Key changes:
- New `mergefactor` reloption for configurable merge thresholds  
- Detection logic in `btvacuumpage()` to identify merge candidates
- Tuple movement implementation in `_bt_unlink_halfdead_page()`
- WAL logging enhancements to handle cross-page dependencies during replay

The last part needs further improvements (it's simply REGBUF_FORCE_IMAGE for 
now), but I want to start a discussion and ask for known problems of the 
approach.

*** Correctness:

The implementation reuses existing locking, critical sections and WAL logging 
infrastructure. To handle cross-page dependencies during WAL replay, when 
tuples are merged, the right sibling buffer is registered with 
`REGBUF_FORCE_IMAGE`, this is a temporary workaround.

*** Current Status & Questions:

The patch successfully reduces index bloat and handles basic scenarios, but 
we've identified some areas that need community input:

1. Backward Scan Correctness: The primary concern is ensuring backward scans 
work correctly when pages are being merged concurrently. Since we maintain the 
same locking protocol as existing page deletion, I believe this should be safe, 
but would appreciate expert review of the approach.
2. Performance Impact: The additional checks during vacuum have minimal 
overhead, but broader testing would be valuable. Worst case would be the index 
with leaf pattern (5%,96%,5%,96%,5%,96%...). We will attempt to merge it every 
time spending time on acquiring locks.
3. WAL Consistency: There are still some edge cases with WAL consistency 
checking that need refinement. I think I can handle it, just need to spend 
enough time on debugging real redo instead of imaging right page.


*** Usage:
CREATE INDEX ON table (col) WITH (mergefactor=10);  -- 10% threshold
I don't know if it would be a good idea to enable mergefactor for existing 
indexes.

The feature is particularly beneficial for workloads with high update/delete 
rates that create sparse index pages without triggering complete page deletion.

I'm attaching the patch for review and would welcome feedback on the approach, 
especially regarding backward scan safety and any other correctness concerns we 
may have missed.

Thank you!


Best regards,
Andrey, Nik, Kirk.


[0] https://www.youtube.com/watch?v=3MleDtXZUlM
[1] https://www.youtube.com/watch?v=Ib3SXSFt8mE
[2] https://www.youtube.com/watch?v=D1PEdDcvZTw

Attachment: v1-0001-btree-Add-page-merge-during-vacuum-to-reduce-inde.patch
Description: Binary data

Reply via email to