Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

Heikki Linnakangas Fri, 08 Mar 2019 02:12:48 -0800

On 08/03/2019 12:22, Peter Geoghegan wrote:

I would like to work through these other items with you
(_bt_binsrch_insert() and so on), but I think that it would be helpful
if you made an effort to understand the minusinfkey stuff first. I
spent a lot of time improving the explanation of that within
_bt_compare(). It's important.

Ok, after thinking about it for a while, I think I understand the minusinfinity stuff now. Let me try to explain it in my own words:

Imagine that you have an index with two key columns, A and B. The indexhas two leaf pages, with the following items:


+--------+   +--------+
| Page 1 |   | Page 2 |
|        |   |        |
|    1 1 |   |    2 1 |
|    1 2 |   |    2 2 |
|    1 3 |   |    2 3 |
|    1 4 |   |    2 4 |
|    1 5 |   |    2 5 |
+--------+   +--------+

The key space is neatly split on the first key column - probably thanksto the new magic in the page split code.

Now, what do we have as the high key of page 1? Answer: "2 -inf". The"-inf" is not stored in the key itself, the second key column is justomitted, and the search code knows to treat it implicitly as a valuethat's lower than any real value. Hence, "minus infinity".

However, during page deletion, we need to perform a search to find thedownlink pointing to a leaf page. We do that by using the leaf page'shigh key as the search key. But the search needs to treat it slightlydifferently in that case. Normally, searching with a single key value,"2", we would land on page 2, because any real value beginning with "2"would be on that page, but in the page deletion case, we want to findpage 1. Setting the BTScanInsert.minusinfkey flag tells the search codeto do that.

Question: Wouldn't it be more straightforward to use "1 +inf" as page1's high key? I.e treat any missing columns as positive infinity. Thatway, the search for page deletion wouldn't need to be treateddifferently. That's also how this used to work: all tuples on a pageused to be <= high key, not strictly < high key. And it would also makethe rightmost page less of a special case: you could pretend therightmost page has a pivot tuple with all columns truncated away, whichmeans positive infinity.


You have this comment _bt_split which touches the subject:

        /*
         * The "high key" for the new left page will be the first key that's 
going
         * to go into the new right page, or possibly a truncated version if 
this
         * is a leaf page split.  This might be either the existing data item at
         * position firstright, or the incoming tuple.
         *
         * The high key for the left page is formed using the first item on the
         * right page, which may seem to be contrary to Lehman & Yao's approach 
of
         * using the left page's last item as its new high key when splitting on
         * the leaf level.  It isn't, though: suffix truncation will leave the
         * left page's high key fully equal to the last item on the left page 
when
         * two tuples with equal key values (excluding heap TID) enclose the 
split
         * point.  It isn't actually necessary for a new leaf high key to be 
equal
         * to the last item on the left for the L&Y "subtree" invariant to hold.
         * It's sufficient to make sure that the new leaf high key is strictly
         * less than the first item on the right leaf page, and greater than or
         * equal to (not necessarily equal to) the last item on the left leaf
         * page.
         *
         * In other words, when suffix truncation isn't possible, L&Y's exact
         * approach to leaf splits is taken.  (Actually, even that is slightly
         * inaccurate.  A tuple with all the keys from firstright but the heap 
TID
         * from lastleft will be used as the new high key, since the last left
         * tuple could be physically larger despite being opclass-equal in 
respect
         * of all attributes prior to the heap TID attribute.)
         */


But it doesn't explain why it's done like that.

- Heikki

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

Reply via email to