On Thu, 2006-01-26 at 00:33 +0000, Simon Riggs wrote: > The enclosed patch substantially improves large sort performance, in the > general case > > cvstip: elapsed 5693 sec, CPU 196 sec > patch: elapsed 4132 sec, CPU 90 sec >

Following patch implements matching sort cost calculations in the planner in sort_cost() This patch is in-addition to the previous patch. Best Regards, Simon Riggs

Index: src/backend/optimizer/path/costsize.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v retrieving revision 1.152 diff -c -r1.152 costsize.c *** src/backend/optimizer/path/costsize.c 28 Dec 2005 01:29:59 -0000 1.152 --- src/backend/optimizer/path/costsize.c 31 Jan 2006 19:30:15 -0000 *************** *** 70,79 **** #include "utils/selfuncs.h" #include "utils/lsyscache.h" #include "utils/syscache.h" ! #define LOG2(x) (log(x) / 0.693147180559945) - #define LOG6(x) (log(x) / 1.79175946922805) /* * Some Paths return less than the nominal number of rows of their parent --- 70,78 ---- #include "utils/selfuncs.h" #include "utils/lsyscache.h" #include "utils/syscache.h" ! #include "utils/tuplesort.h" #define LOG2(x) (log(x) / 0.693147180559945) /* * Some Paths return less than the nominal number of rows of their parent *************** *** 767,779 **** * If the total volume exceeds work_mem, we switch to a tape-style merge * algorithm. There will still be about t*log2(t) tuple comparisons in * total, but we will also need to write and read each tuple once per ! * merge pass. We expect about ceil(log6(r)) merge passes where r is the ! * number of initial runs formed (log6 because tuplesort.c uses six-tape ! * merging). Since the average initial run should be about twice work_mem, * we have ! * disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem))) * cpu = comparison_cost * t * log2(t) * * The disk traffic is assumed to be half sequential and half random * accesses (XXX can't we refine that guess?) * --- 766,783 ---- * If the total volume exceeds work_mem, we switch to a tape-style merge * algorithm. There will still be about t*log2(t) tuple comparisons in * total, but we will also need to write and read each tuple once per ! * merge pass. We expect about ceil(logT(r)) merge passes where r is the ! * number of initial runs formed and logT means logarithm-in-base-T because ! * tuplesort.c uses variable numbers of logical tapes (T) to perform merging. ! * Since the average initial run should be about twice work_mem, * we have ! * num logical tapes = work_mem / OPTIMAL_MERGE_BUFFER_SIZE ! * disk traffic = 2 * relsize * ceil(logT(p / (2*work_mem))) * cpu = comparison_cost * t * log2(t) * + * Fully sorted data will appear as only a single run, no matter how large, + * so this is an overestimate in that case, but a safe assumption + * * The disk traffic is assumed to be half sequential and half random * accesses (XXX can't we refine that guess?) * *************** *** 824,834 **** { double npages = ceil(nbytes / BLCKSZ); double nruns = (nbytes / work_mem_bytes) * 0.5; ! double log_runs = ceil(LOG6(nruns)); double npageaccesses; ! if (log_runs < 1.0) ! log_runs = 1.0; npageaccesses = 2.0 * npages * log_runs; /* Assume half are sequential (cost 1), half are not */ startup_cost += npageaccesses * --- 828,847 ---- { double npages = ceil(nbytes / BLCKSZ); double nruns = (nbytes / work_mem_bytes) * 0.5; ! double ntapes = (work_mem_bytes / OPTIMAL_MERGE_BUFFER_SIZE); ! double log_runs; double npageaccesses; ! /* ! * The number of runs will vary in proportion to the logT ! * We use the invariant scaling property of logs to calculate ! * the log in a variable base ! */ ! if (ntapes < nruns) ! log_runs = ceil (log(nruns) / log(ntapes)); ! else ! log_runs = 1.0; ! npageaccesses = 2.0 * npages * log_runs; /* Assume half are sequential (cost 1), half are not */ startup_cost += npageaccesses *

---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org