Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

Peter Geoghegan Tue, 22 Dec 2015 10:50:55 -0800

On Tue, Dec 22, 2015 at 9:31 AM, Robert Haas <[email protected]> wrote:
>> It has some nice properties, because unsigned integers are so simple
>> and flexible. For example, we can do it with int4 sometimes, and then
>> compare two attributes at once on 64-bit platforms. Maybe the second
>> attribute (the second set of 4 bytes) isn't an int4, though -- it's
>> any other type's abbreviated key (which can be assumed to have a
>> compatible representation). That's one more advanced possibility.
>
> Yikes.


I'm not suggesting we'd want to do that immediately, or even at all.
My point is -- stuff like this becomes possible. My intuition is that
it might come up in a bunch of places.

>> Another issue is that abbreviated keys in indexes are probably not
>> going to take 8 bytes, because they'll go in the ItemId array. It's
>> likely to be very useful to be able to take the first two bytes, and
>> know that a uint16 comparison is all that is needed, and have a basis
>> to perform an interpolation search rather than a binary search
>> (although that needs to be adaptive to different distributions, I
>> think it could be an effective technique -- there are many cache
>> misses for binary searches on internal B-Tree pages).
>
> Hmm, that seems iffy to me.  There are plenty of data sets where 8
> bytes is enough to get some meaningful information about what part of
> the keyspace you're in, but 2 bytes isn't.

Your experience with abbreviated keys for sorting is unlikely to
perfectly carry over here. That's because the 2 bytes are only put in
the internal pages (typically less than 1% of the total). These
internal pages typically have relatively low density anyway (we don't
use the The B* tree technique), so I think that the level of bloat
would be tiny. At the same time, these are the range of values that
naturally have very wide fan-out. For example, the entire range of
values in the index is represented at the root page.

All of that having been said, yes, it could still be useless. But
then, the overhead will still tend to be nill, due to the fact that
abbreviated key generation can be so effectively amortized (it happens
only once, per page split), and because branch prediction will tend to
remove any penalty.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

Reply via email to