Re: [HACKERS] PoC: Partial sort

Alexander Korotkov Fri, 12 Sep 2014 14:21:05 -0700

On Sun, Jul 13, 2014 at 6:45 AM, Peter Geoghegan <[email protected]> wrote:

> On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov
> <[email protected]> wrote:
> > Done. Patch is splitted.
>
> I took a quick look at this.
>
> Have you thought about making your new cmpSortSkipCols() function not
> use real comparisons? Since in the circumstances in which this
> optimization is expected to be effective (e.g. your original example)
> we can also expect a relatively low cardinality for the first n
> indexed attributes (again, as in your original example), in general
> when cmpSortSkipCols() is called there is a high chance that it will
> return true. If any pair of tuples (logically adjacent tuples fed in
> to cmpSortSkipCols() by an index scan in logical order) are not fully
> equal (i.e. their leading, indexed attributes are not equal) then we
> don't care about the details -- we just know that a new sort grouping
> is required.
>

Actually, higher cardinality skip columns is better. Sorting of smaller
groups is faster than sorting larger groups of same size. Also, with
smaller groups you achieve limit more accurate (in average), i.e. sort
smaller amount of total rows.

> The idea here is that you can get away with simple binary equality
> comparisons, as we do when considering HOT-safety. Of course, you
> might find that two bitwise unequal values are equal according to
> their ordinary B-Tree support function 1 comparator (e.g. two numerics
> that differ only in their display scale). AFAICT this should be okay,
> since that just means that you have smaller sort groupings than
> strictly necessary. I'm not sure if that's worth it to more or less
> duplicate heap_tuple_attr_equals() to save a "mere" n expensive
> comparisons, but it's something to think about (actually, there are
> probably less than even n comparisons in practice because there'll be
> a limit).
>

Not correct. Smaller groups are not OK. Imagine that two representations of
same skip column value exists. Index may return them in any order, even
change them one by one. In this case sorting on other column never takes
place, while it should. But some optimizations are still possible:

   1. Use bitwise comparison first, then recheck. But, no guarantees that
   acceleration will be achieved.
   2. Use equality check instead of btree comparison. For "text" datatype
   it would be rather faster because of no locale-aware comparison.

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

Reply via email to