I have thought a little further about my proposal to have inner B-Tree
pages store strxfrm() blobs [1]; my earlier conclusion was that this
could not be trusted when equality was indicated [2] due to the
"Hungarian problem" [3]. As I noted earlier, that actually doesn't
really matter, because we may have missed what's really so useful
about strxfrm() blobs: We *can* trust a non-zero result as a proxy for
what we would have gotten with a proper bttextcmp(), even if a
strcmp() only looks at the first byte of a blob for each of two text
strings being compared. Here is what the strxfrm() blob looks like for
some common English words when put through glibc's strxfrm() with the
en_US.UTF-8 collation (I wrote a wrapper function to do this and
output bytea a couple of years ago):

[local]/postgres=# select strxfrm_test('Yellow');
(1 row)

[local]/postgres=# select strxfrm_test('Red');
(1 row)

[local]/postgres=# select strxfrm_test('Orange');
(1 row)

[local]/postgres=# select strxfrm_test('Green');
(1 row)

[local]/postgres=# select strxfrm_test('White');
(1 row)

[local]/postgres=# select strxfrm_test('Black');
(1 row)

Obviously, all of these blobs, while much larger than the original
string still differ in their first byte. It's almost as if they're
intended to be truncated.

The API I envisage is a new support function 3 that operator class
authors may optionally provide. Within the support function a text
varlena argument is passed, or whatever the type that relates to the
opclass happens to be. It returns a Datum to the core system. That
datum is always what is actually passed to their sort support routine
(B-Tree support function number 2) iff a support function 3 is
provided (you must provide number 2 if you provide number 3, but not
vice-versa). In respect of support-function-3-providing opclasses, the
core system is entitled to take the sort support return value as a
proxy for what a proper support function 1 call would indicate iff the
sort support routine does not return 0 in respect of two
support-function-3 blobs (typically strxfrm() blobs truncated at 8
bytes for convenient storage as pass-by-value Datums). Otherwise, a
proper call to support function 1, with fully-formed text arguments is

I see at least two compelling things we can do with these blobs:

1. Store them as pseudo-columns of inner B-Tree IndexTuples (not leaf
pages). Naturally, inner pages are very heterogeneous, so only having
8 bytes is very probably an excellent trade-off there. Typically 1-3%
of B-Tree pages are inner pages, so the bloat risk seems acceptable.

2. When building a SortTuple array within TupleSort, we can store far
more of these truncated blobs in memory than we can proper strings. if
SortTuple.datum1 (the first column to sort on among tuples being
sorted, which is currently stored in memory as an optimization) was
just an 8 byte truncated strxfrm() blob, and not a pointer to a text
string in memory, that would be pretty great for performance for
several reasons. So just as with B-Tree inner pages, for SortTuples
there can be a pseudo leading key - we need only compare additional
sort keys/heap_getattr() when we need a tie-breaker, when those 8
bytes aren't enough to reach a firm conclusion.

It doesn't just stop with strxfrm() blobs, though. Why couldn't we
create blobs that can be used as reliable proxies for numerics, that
are just integers? Sure, you need to reserve a bit to indicate an
inability to represent the original value, and possibly work out other
details like that, but the only negative thing you can say about that,
or indeed applying these techniques to any operator class is that it
might not help in certain worst cases (mostly contrived cases). Still,
the overhead of doing that bit of extra work is surely quite low
anyway - at worst, a extra few instructions wasted per comparison -
making these techniques likely quite profitable for the vast majority
of real-world applications.

Does anyone else want to pick this idea up and run with it? I don't
think I'll have time for it.




Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to