On Tue, Nov 18, 2025 at 2:28 AM Yugo Nagata <[email protected]> wrote: > I've attached a patch that adds the following comment: > > + * nextkey determines how the scankey's boundary is interpreted, and backward > + * indicates a backward scan. See comments in _bt_first for a more detailed > + * explanation of these fields. > > What do think?
Seems reasonable to me. I wonder if we also do something about these existing _bt_binsrch comments: * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber * of the last key < given scankey, or last key <= given scankey if nextkey * is true. (Since _bt_compare treats the first data key of such a page as * minus infinity, there will be at least one key < scankey, so the result * always points at one of the keys on the page.) Here we describe what happens on an internal page. This is correct, but *seems* to contradict the higher level comments at the end of _bt_first. There is actually no contradiction between the two -- not when you understand that _bt_first describes the whole end-to-end process of calling _bt_search and then calling _bt_binsrch on the leaf page (not of calling _bt_binsrch once, against an internal page). One could think of this _bt_binsrch internal page behavior as compensating for the fact that internal pages have pivot tuples that consist of a separator key (except for the first key, which is just has a -inf key/no key), plus a downlink that points to the *next* page after that separator key one level down (except for the internal page high key, which has no downlink at all). Might be useful to say something like that instead. This is hard to explain without an example. Right now, an internal page might have pivot tuples that look like this: Separator: -inf, Downlink: 1 Separator: 'a', Downlink: 2 Separator: 'c', Downlink: 3 Separator: 'f', Downlink: (none, this is the high key) But _bt_binsrch "pretends" that our internal page actually contains: Downlink: 1 Separator: 'a' Downlink: 2 Separator: 'c' Downlink: 3 Separator: 'f' So if our = scan key is (say) 'c', we should descend using the downlink that points to block 2 (not the one that points to block 3, as might be expected from looking at the real physical representation consisting of pivot tuples). That is what the scan needs to do to get to the page one level down whose high key is also 'c' -- that's where the scan needs to look. (If the next level down is the leaf level, then the leaf page we descend to might also contain a *non*-pivot tuple with the key value 'c', "right before" the high key with 'c', which the scan will need to return in _bt_readpage. Lehman & Yao allow the key before a leaf page's high key to be equal to the high key, but otherwise forbid all duplicate keys.) I find it very hard to know what explanation will be the least confusing to other people, at least in this area. Since you're interested in this area, I thought I'd ask what you think. (Here I'm pretending that the keys in the tree are unique without needing a heap TID, per classic L&Y, which is unrealistic but simplifies the example.) -- Peter Geoghegan
