On Fri, May 8, 2026 at 10:40 AM Salma El-Sayed <[email protected]> wrote: > A forward scanner that arrives at L after the merge sees BTP_MERGED_AWAY and > follows through to R. > A backward scanner that arrives at R after the merge sees BTP_MERGED, reads R > (which now contains L's data), and skips L entirely.
This seems OK for the first merge, but I think you need to be a lot more explicit about what's going to happen after that. For instance, what if you want to perform a merge on a page that is already marked BTP_MERGED? Or, for example, what happens if more splits happen after the merge? Like if we have page A and then page B, we might mark A BTP_MERGED_AWAY and B BTP_MERGED. Now suppose at the time this happens, a scan is pointing to page A. Before the scan advances to page B, that page gets split, so now we have: A(BTP_MERGED_AWAY) B0 (???) B1 (???). The problem is that we've already read page A, so we need to use the logic that skips over tuples we may have already read on both B0 and B1, both of which contain some of the tuples from B, which now includes everything that we already read from A. So presumably to make that work, we need to mark both B0 and B1 as BTP_MERGED. But if we do that, then there's no longer a 1:1 relationship between BTP_MERGED_AWAY pages and BTP_MERGED pages. When we come to a BTP_MERGED page, we don't know if it corresponds to some BTP_MERGED_AWAY page we previously encountered, or some other BTP_MERGED_AWAY page from long ago. I'm not certain, but I am suspicious that using flag bits for this is not going to work out. Maybe a flag bit is OK for the page that is going away, because then it eventually transitions to half-dead like you said. But for the surviving page, if that's just indicated by it being marked BTP_MERGED, then eventually we can just end up with tons of BTP_MERGED pages in the heap and there's nothing to unset those bits. That's probably going to break something; if it doesn't, then it seems unclear that BTP_MERGED needs to exists in the first place. I feel like we might need to mark the surviving page using some kind of indicator that "times out," like an XID or something, so that we don't have to go back and clear BTP_MERGED flags later. But I don't really know. > How should the merge process be triggered? This seems really tricky. I think if the user has to manually run a "try to merge pages" command or function, this functionality won't get used very much. Ideally it would happen either automatically during foreground operation, or as part of VACUUM. But that seems complicated to make work, because there's a risk of merging pages too aggressively, which could not only waste work but result in them being split again soon afterward. -- Robert Haas EDB: http://www.enterprisedb.com
