On Fri, Sep 12, 2014 at 2:19 PM, Alexander Korotkov <aekorot...@gmail.com> wrote: > Actually, higher cardinality skip columns is better. Sorting of smaller > groups is faster than sorting larger groups of same size. Also, with smaller > groups you achieve limit more accurate (in average), i.e. sort smaller > amount of total rows.
Higher cardinality leading key columns are better for what? Do you mean they're better in that those cases are more sympathetic to your patch, or better in the general sense that they'll perform better for the user? The first example query, that you chose to demonstrate your patch had a leading, indexed column (column "v1") with much lower cardinality/n_distinct than the column that had to be sorted on (column "v2"). That was what my comments were based on. When this feature is used, there will often be a significantly lower cardinality in the leading, indexed column (as in your example). Otherwise, the user might well have been happy to just order on the indexed/leading column alone. That isn't definitely true, but it's often true. >> I'm not sure if that's worth it to more or less >> duplicate heap_tuple_attr_equals() to save a "mere" n expensive >> comparisons, but it's something to think about (actually, there are >> probably less than even n comparisons in practice because there'll be >> a limit). > > Not correct. Smaller groups are not OK. Imagine that two representations of > same skip column value exists. Index may return them in any order, even > change them one by one. In this case sorting on other column never takes > place, while it should. I think I explained this badly - it wouldn't be okay to make the grouping smaller, but as you say we could tie-break with a proper B-Tree support function 1 comparison to make sure we really had reached the end of our grouping. FWIW I want all bttextcmp()/varstr_cmp() comparisons to try a memcmp() first, so the use of the equality operator with text in mind that you mention may soon not be useful at all. The evidence suggests that memcmp() is so much cheaper than special preparatory NUL-termination + strcoll() that we should always try it first when sorting text, even when we have no good reason to think it will work. The memcmp() is virtually free. And so, you see why it might be worth thinking about this when we already have reasonable confidence that many comparisons will indicate that datums are equal. Other datatypes will have expensive "real" comparators, not just text or numeric. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers