On 6 January 2012 21:14, Tom Lane <t...@sss.pgh.pa.us> wrote:
> When there are lots of duplicates of a particular indexed value, the
> existing code will cause an indexscan to search them in physical order,
> whereas if we remove the existing logic it'll be random --- in
> particular, accesses to the same heap page can no longer be expected to
> be clustered.

Isn't it possible to get them in physical order anyway, by reading
them into memory in that order? Efficient quick sort implementations
are not stable, and ours is no exception, but we could perhaps come up
with a cheaper tie-breaker value at that stage, if you're determined
to maintain this behaviour. We have sufficient incentive to, as I
describe below.

> Admittedly, I don't have any numbers quantifying just how useful that
> might be, but on the other hand you've not presented any evidence
> justifying removing the behavior either.  If we believe your position
> that indexes don't generally have lots of duplicates, then the code in
> question will seldom be reached and therefore there would be no
> performance benefit in removing it.

I ran the same btree benchmark on master, but without the "cheap
insurance". The results were interesting, to say the least.

The columns indexed were the same columns and data that I've been
using all along. Initially this made sense, as the performance of
multi sort key sorts often largely hinged on being able to get away
with doing one comparison per pair of tuples - with many duplicates, I
could avoid cheating and show something closer to worst case for the

I didn't think it mattered that indexing the same columns would
produce what happened to be a not so useful index in the real world,
due to having so many duplicates - better to have figures that were
somewhat comparable for btree tuple sorting and heap tuple sorting.

When I ran the same benchmark on a server that differed from master
only in that their was no insurance, it momentarily appeared that
*all* of the gains for btree index creation came from being able to
elide the "cheap insurance", but only where it would have to be paid
for a high number of times.

I soon realised that I'd made a blunder: the code (that is, the patch
that I posed most recently) wasn't even using my specialisation for
qsorting, because the SortSupport pointer was null! I did not have
tuplesort_begin_index_btree initialise the SortSupport struct as
tuplesort_begin_heap did, so my earlier benchmark was effectively
meaningless, except that it indicated the benefits of eliding the
cheap insurance alone, if only for that not so compelling case. You
should note that the benefits of not paying for the insurance can be
very significant indeed.

Attached are figures for an identical run of the same btree python
script, but with a version of the patch that actually uses my
specialisations. Granted, these numbers are still partially predicated
on the index in question having a large number of duplicates, but it's
worth noting:

1. The gain from specialisation isn't bad; not as good as the
improvements we saw for heap tuples, but not so bad either, especially
considering that binary bloat should be much less controversial for
btree tuples.

2. The index that results from the tests is still useful; the planner
is perfectly willing to use it rather than than performing an
in-memory sort. It will also use it to satisfy a query like "select *
from orderlines where prod_id = 5", albeit via a bitmap index scan. I
took the precaution of increasing default_statistics_target to its
maximum value, as well an performing an analyze on orderlines in
advance of checking this.

Revision to this patch that fixes the bug to follow - I produced these
new numbers from a rough cut of that.

Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment: results_server_btree_w_specializations.ods
Description: application/vnd.oasis.opendocument.spreadsheet

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to