On Wed, Sep 16, 2015 at 8:53 AM, Nicolas Barbier
<nicolas.barb...@gmail.com> wrote:
> After thinking about it a bit more, it indeed seems never useful to
> have f3 in the internal nodes if it is not part of the columns that
> determine the UNIQUE property. It could as well be pushed out of the
> internal nodes and only appear in the leaf nodes.

Correct. That's a standard technique in B-Tree implementations like
our own; suffix truncation can remove unneeded information from the
end of values, possibly including entire attributes, which can be
truncated in a way that is similar to this patch.

The difference here is only that this patch does not dynamically
determine which attributes can be removed while re-finding the parent
downlink in the second phase of a page split (the usual place it
happens with standard suffix truncation). Rather, this patch knows "a
priori" that it can truncate attributes that are merely "included"
attributes. That means that this patch has as much to do with
increasing B-Tree fan-out for these indexes as it does for making
unique indexes more usable for index-only scans. Both of those goals
are important, IMV.

This patch seems pretty cool. I noticed some issues following a quick
read though the patch including_columns_6.0.patch that Anastasia
should look into:

* You truncate (remove suffix attributes -- the "included" attributes)
within _bt_insertonpg():

-   right_item = CopyIndexTuple(item);
+   indnatts = IndexRelationGetNumberOfAttributes(rel);
+   indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+   if (indnatts != indnkeyatts)
+   {
+       right_item = index_reform_tuple(rel, item, indnatts, indnkeyatts);
+       right_item_sz = IndexTupleDSize(*right_item);
+       right_item_sz = MAXALIGN(right_item_sz);
+   }
+   else
+       right_item = CopyIndexTuple(item);
    ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY);

I suggest that you do this within _bt_insert_parent(), instead, iff
the original target page is know to be a leaf page. That's where it
needs to happen for conventional suffix truncation, which has special
considerations when determining which attributes are safe to truncate
(or even which byte in the first distinguishing attribute it is okay
to truncate past). Conventional suffix truncation (not this patch)
could truncate, say, "C" collation text past the first distinguishing
byte, where the byte distinguishes the new downlink being inserted
into the parent page compared to both the left downlink and right
downlink in the parent page-- the minimum amount of information to
correctly guide later index scans is only stored. But it isn't correct
(again, with conventional suffix truncation) to do this passed the
leaf level. It must end there.

It isn't safe past the leaf level (by which I mean when inserting a
downlink into its parent, one level up) because applying suffix
truncation again for the next level up might guide a search to the
highest node in the left sub-tree rather than to the lowest node in
the right sub-tree, or vice versa. In general, we must be careful
about "cousin" nodes, that are beside each other but are not
"siblings" due to not sharing the same parent. It doesn't really
matter that this restriction exists, because you get almost all the
benefit at the leaf -> immediate parent level anyway. Higher levels
will reuse already truncated Index Tuples, that are typically
"truncated enough".

So, this should work in a similar way to conventional suffix
truncation (BTW, you should work on that later). And so, it should
just do it there. Besides, checking it only where it could possibly
help is clearer, since as written the code in _bt_insertonpg() will
never need to truncate following a non-leaf/internal page split.

* I think the comparison logic may have a bug.

Does this work with amcheck? Maybe it works with bt_index_check(), but
not bt_index_parent_check()? I think that you need to make sure that
_bt_compare() knows about this, too. That's because it isn't good
enough to let a truncated internal IndexTuple compare equal to a
scankey when non-truncated attributes are equal. I think you need to
have an imaginary "minus infinity" attribute past the first
non-truncated attribute (i.e. "minus infinity value" for the first
*truncated* attribute). That way, the downlinks will always be lower
bounds when the non-"included"/truncated attributes are involved. This
seems necessary. No?

It's necessary because you aren't storing any attributes, so it's not
acceptable to even attempt a comparison -- I think that will segfault
(doesn't matter that the index scan wouldn't have returned anything
anyway). I think it's also necessary because of  issues with "cousin"
nodes making index scans lose their way.

That's all I have right now. Nice work.
Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to