On 06/02/14 18:21, Jeff Janes wrote:
  How big of
sets were you sorting in each case?

Big enough to go external.  The timings and compare-counts given are
purely for the heapify stage not the total for the sort, so are
constrained by the work_mem not by the sort size per se.

I'm limited to working on a small machine, so the work_mem value
of 1e+6 is approaching my top limit of sort-time pain unless I rip the
heapify out into a test harness.  What range of work_mem is it useful
to test over?

 Was it always just slightly bigger
than would fit in work_mem?

I didn't check.

 Did you try sorting already-sorted, reverse
sorted, or pipe-organ shaped data sets?

Not yet, but I can.  What variety of pipe-organ shape is of
interest (I can think of several that might be called that)?

 We will also need to test it on
strings.  I usually use md5(random()::text) to generate strings for such
purposes, at least for a first pass.

OK.


The name of the attached patch is a bit unfortunate.

Is there a project standard for this?

 And I'm not sure what
you are doing with tupindex, the change there doesn't seem to be necessary
to the rest of your work, so some explanation on that would be helpful.

I'll add commentary.

The project coding style usually has comments in blocks before loops on
which they comment, rather than interspersed throughout the clauses of the
"for" statement.

I'll adjust that.

Another optimization that is possible is to do only one comparison at each
level (between the two children) all the way down the heap, and then
compare the parent to the chain of smallest children in reverse order.
  This can substantially reduce the number of comparisons on average,
because most tuples sift most of the way down the heap (because most of the
tuples are at the bottom of the heap).

Sounds interesting; I'll see if I can get that going.

(Also, I think you should make your siftdown code look more like the
existing siftdown code.)

A function called by the outer loop?  Can do; the existing does that
because the function is called from multiple places but this will only
be used the once.

Thanks for the review.
--
Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to