This was a suggestion made back in March that would dramatically reduce the overhead of EXPLAIN ANALYZE on queries that loop continuously over the same nodes.
http://archives.postgresql.org/pgsql-hackers/2006-03/msg01114.php What it does behave normally for the first 50 tuples of any node, but after that it starts sampling at ever increasing intervals, the intervals controlled by an exponential function. So for a node iterating over 1 million tuples it takes around 15,000 samples. The result is that EXPLAIN ANALYZE has a much reduced effect on the total execution time. Without EXPLAIN ANALYZE: postgres=# select count(*) from generate_series(1,1000000); count --------- 1000000 (1 row) Time: 2303.599 ms EXPLAIN ANALYZE without patch: postgres=# explain analyze select count(*) from generate_series(1,1000000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=8022.070..8022.073 rows=1 loops=1) -> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1381.762..4873.369 rows=1000000 loops=1) Total runtime: 8042.472 ms (3 rows) Time: 8043.401 ms EXPLAIN ANALYZE with patch: postgres=# explain analyze select count(*) from generate_series(1,1000000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=2469.491..2469.494 rows=1 loops=1) -> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1405.002..2187.417 rows=1000000 loops=1) Total runtime: 2496.529 ms (3 rows) Time: 2497.488 ms As you can see, the overhead goes from 5.7 seconds to 0.2 seconds. Obviously this is an extreme case, but it will probably help in a lot of other cases people have been complaining about. - To get this close it needs to get an estimate of the sampling overhead. It does this by a little calibration loop that is run once per backend. If you don't do this, you end up assuming all tuples take the same time as tuples with the overhead, resulting in nodes apparently taking longer than their parent nodes. Incidently, I measured the overhead to be about 3.6us per tuple per node on my (admittedly slightly old) machine. Note that the resulting times still include the overhead actually incurred, I didn't filter it out. I want the times to remain reflecting reality as closely as possible. - I also removed InstrStopNodeMulti and made InstrStopNode take a tuple count parameter instead. This is much clearer all round. - I also didn't make it optional. I'm unsure about whether it should be optional or not, given the number of cases where it will make a difference to be very few. - The tuple counter for sampling restarts every loop. Thus a node that is called repeatedly only returning one value each time will still measure every tuple, though its parent node won't. We'll need some field testing to see if that remains a significant effect. - I don't let the user know anywhere how many samples it took. Is this something users should care about? Any comments? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate.
Index: src/backend/commands/trigger.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.201 diff -c -r1.201 trigger.c *** src/backend/commands/trigger.c 27 Apr 2006 00:33:41 -0000 1.201 --- src/backend/commands/trigger.c 9 May 2006 20:35:07 -0000 *************** *** 1306,1312 **** * one "tuple returned" (really the number of firings). */ if (instr) ! InstrStopNode(instr + tgindx, true); return (HeapTuple) DatumGetPointer(result); } --- 1306,1312 ---- * one "tuple returned" (really the number of firings). */ if (instr) ! InstrStopNode(instr + tgindx, 1); return (HeapTuple) DatumGetPointer(result); } *************** *** 2154,2160 **** * one "tuple returned" (really the number of firings). */ if (instr) ! InstrStopNode(instr + tgindx, true); } --- 2154,2160 ---- * one "tuple returned" (really the number of firings). */ if (instr) ! InstrStopNode(instr + tgindx, 1); } Index: src/backend/executor/execProcnode.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/executor/execProcnode.c,v retrieving revision 1.54 diff -c -r1.54 execProcnode.c *** src/backend/executor/execProcnode.c 5 Mar 2006 15:58:26 -0000 1.54 --- src/backend/executor/execProcnode.c 9 May 2006 20:35:07 -0000 *************** *** 423,429 **** } if (node->instrument) ! InstrStopNode(node->instrument, !TupIsNull(result)); return result; } --- 423,429 ---- } if (node->instrument) ! InstrStopNode(node->instrument, TupIsNull(result) ? 0 : 1 ); return result; } Index: src/backend/executor/instrument.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/executor/instrument.c,v retrieving revision 1.14 diff -c -r1.14 instrument.c *** src/backend/executor/instrument.c 5 Mar 2006 15:58:26 -0000 1.14 --- src/backend/executor/instrument.c 9 May 2006 20:35:07 -0000 *************** *** 14,22 **** --- 14,84 ---- #include "postgres.h" #include <unistd.h> + #include <math.h> #include "executor/instrument.h" + /* This is the function that is used to determine the sampling intervals. In + * general, if the function is f(x), then for N tuples we will take on the + * order of integral(1/f(x), x=0..N) samples. Some examples follow, with the + * number of samples that would be collected over 1,000,000 tuples. + + f(x) = x => log2(N) 20 + f(x) = x^(1/2) => 2 * N^(1/2) 2000 + f(x) = x^(1/3) => 1.5 * N^(2/3) 15000 + + * I've chosen the last one as it seems to provide a good compromise between + * low overhead but still getting a meaningful number of samples. However, + * not all machines have the cbrt() function so on those we substitute + * sqrt(). The difference is not very significant in the tests I made. + */ + #ifdef HAVE_CBRT + #define SampleFunc cbrt + #else + #define SampleFunc sqrt + #endif + + #define SAMPLE_THRESHOLD 50 + + static double SampleOverhead; + static bool SampleOverheadCalculated; + + static void + CalculateSampleOverhead() + { + Instrumentation instr; + int i; + + /* We want to determine the sampling overhead, to correct + * calculations later. This only needs to be done once per backend. + * Is this the place? A wrong value here (due to a mistimed + * task-switch) will cause bad calculations later. + * + * To minimize the risk we do it a few times and take the lowest. + */ + + SampleOverhead = 1.0e6; + + for( i = 0; i<5; i++ ) + { + int j; + double overhead; + + memset( &instr, 0, sizeof(instr) ); + + /* Loop SAMPLE_THRESHOLD times or 100 microseconds, whichever is faster */ + for( j=0; j<SAMPLE_THRESHOLD && INSTR_TIME_GET_DOUBLE(instr.counter) < 100e-6; i++ ) + { + InstrStartNode( &instr ); + InstrStopNode( &instr, 1 ); + } + overhead = INSTR_TIME_GET_DOUBLE(instr.counter) / instr.samplecount; + if( overhead < SampleOverhead ) + SampleOverhead = overhead; + } + + SampleOverheadCalculated = true; + } /* Allocate new instrumentation structure(s) */ Instrumentation * *************** *** 25,31 **** Instrumentation *instr = palloc0(n * sizeof(Instrumentation)); /* we don't need to do any initialization except zero 'em */ ! return instr; } --- 87,96 ---- Instrumentation *instr = palloc0(n * sizeof(Instrumentation)); /* we don't need to do any initialization except zero 'em */ ! ! /* Calculate overhead, if not done yet */ ! if( !SampleOverheadCalculated ) ! CalculateSampleOverhead(); return instr; } *************** *** 34,82 **** InstrStartNode(Instrumentation *instr) { if (INSTR_TIME_IS_ZERO(instr->starttime)) ! INSTR_TIME_SET_CURRENT(instr->starttime); else elog(DEBUG2, "InstrStartNode called twice in a row"); } /* Exit from a plan node */ void ! InstrStopNode(Instrumentation *instr, bool returnedTuple) { instr_time endtime; /* count the returned tuples */ ! if (returnedTuple) ! instr->tuplecount += 1; ! if (INSTR_TIME_IS_ZERO(instr->starttime)) { ! elog(DEBUG2, "InstrStopNode called without start"); ! return; ! } ! INSTR_TIME_SET_CURRENT(endtime); #ifndef WIN32 ! instr->counter.tv_sec += endtime.tv_sec - instr->starttime.tv_sec; ! instr->counter.tv_usec += endtime.tv_usec - instr->starttime.tv_usec; ! /* Normalize after each add to avoid overflow/underflow of tv_usec */ ! while (instr->counter.tv_usec < 0) ! { ! instr->counter.tv_usec += 1000000; ! instr->counter.tv_sec--; ! } ! while (instr->counter.tv_usec >= 1000000) ! { ! instr->counter.tv_usec -= 1000000; ! instr->counter.tv_sec++; ! } #else /* WIN32 */ ! instr->counter.QuadPart += (endtime.QuadPart - instr->starttime.QuadPart); #endif ! INSTR_TIME_SET_ZERO(instr->starttime); /* Is this the first tuple of this cycle? */ if (!instr->running) --- 99,167 ---- InstrStartNode(Instrumentation *instr) { if (INSTR_TIME_IS_ZERO(instr->starttime)) ! { ! /* We always sample the first SAMPLE_THRESHOLD tuples, so small nodes are always accurate */ ! if (instr->tuplecount < SAMPLE_THRESHOLD) ! instr->sampling = true; ! else ! { ! /* Otherwise we go to sampling, see the comments on SampleFunc at the top of the file */ ! if( instr->tuplecount > instr->nextsample ) ! { ! instr->sampling = true; ! /* The doubling is so the random will average 1 over time */ ! instr->nextsample += 2.0 * SampleFunc(instr->tuplecount) * (double)rand() / (double)RAND_MAX; ! } ! } ! if (instr->sampling) ! INSTR_TIME_SET_CURRENT(instr->starttime); ! } else elog(DEBUG2, "InstrStartNode called twice in a row"); } /* Exit from a plan node */ void ! InstrStopNode(Instrumentation *instr, double nTuples) { instr_time endtime; /* count the returned tuples */ ! instr->tuplecount += nTuples; ! if (instr->sampling) { ! if (INSTR_TIME_IS_ZERO(instr->starttime)) ! { ! elog(DEBUG2, "InstrStopNode called without start"); ! return; ! } ! INSTR_TIME_SET_CURRENT(endtime); #ifndef WIN32 ! instr->counter.tv_sec += endtime.tv_sec - instr->starttime.tv_sec; ! instr->counter.tv_usec += endtime.tv_usec - instr->starttime.tv_usec; ! /* Normalize after each add to avoid overflow/underflow of tv_usec */ ! while (instr->counter.tv_usec < 0) ! { ! instr->counter.tv_usec += 1000000; ! instr->counter.tv_sec--; ! } ! while (instr->counter.tv_usec >= 1000000) ! { ! instr->counter.tv_usec -= 1000000; ! instr->counter.tv_sec++; ! } #else /* WIN32 */ ! instr->counter.QuadPart += (endtime.QuadPart - instr->starttime.QuadPart); #endif ! INSTR_TIME_SET_ZERO(instr->starttime); ! instr->samplecount += nTuples; ! instr->sampling = false; ! } /* Is this the first tuple of this cycle? */ if (!instr->running) *************** *** 86,102 **** } } - /* As above, but count multiple tuples returned at once */ - void - InstrStopNodeMulti(Instrumentation *instr, double nTuples) - { - /* count the returned tuples */ - instr->tuplecount += nTuples; - - /* delegate the rest */ - InstrStopNode(instr, false); - } - /* Finish a run cycle for a plan node */ void InstrEndLoop(Instrumentation *instr) --- 171,176 ---- *************** *** 114,121 **** totaltime = INSTR_TIME_GET_DOUBLE(instr->counter); instr->startup += instr->firsttuple; ! instr->total += totaltime; instr->ntuples += instr->tuplecount; instr->nloops += 1; /* Reset for next cycle (if any) */ --- 188,211 ---- totaltime = INSTR_TIME_GET_DOUBLE(instr->counter); instr->startup += instr->firsttuple; ! ! /* Here we take into account sampling effects. Doing it naively ends ! * up assuming the sampling overhead applies to all tuples, even the ! * ones we didn't measure. We've calculated an overhead, so we ! * subtract that for all samples we didn't measure. The first tuple ! * is also special cased, because it usually takes longer. */ ! ! if( instr->samplecount < instr->tuplecount ) ! { ! double pertuple = (totaltime - instr->firsttuple) / (instr->samplecount - 1); ! instr->total += instr->firsttuple + (pertuple * (instr->samplecount - 1)) ! + ((pertuple - SampleOverhead) * (instr->tuplecount - instr->samplecount)); ! } ! else ! instr->total += totaltime; ! instr->ntuples += instr->tuplecount; + instr->nsamples += instr->samplecount; instr->nloops += 1; /* Reset for next cycle (if any) */ *************** *** 123,127 **** --- 213,218 ---- INSTR_TIME_SET_ZERO(instr->starttime); INSTR_TIME_SET_ZERO(instr->counter); instr->firsttuple = 0; + instr->samplecount = 0; instr->tuplecount = 0; } Index: src/backend/executor/nodeBitmapAnd.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeBitmapAnd.c,v retrieving revision 1.6 diff -c -r1.6 nodeBitmapAnd.c *** src/backend/executor/nodeBitmapAnd.c 5 Mar 2006 15:58:26 -0000 1.6 --- src/backend/executor/nodeBitmapAnd.c 9 May 2006 20:35:07 -0000 *************** *** 163,169 **** /* must provide our own instrumentation support */ if (node->ps.instrument) ! InstrStopNodeMulti(node->ps.instrument, 0 /* XXX */ ); return (Node *) result; } --- 163,169 ---- /* must provide our own instrumentation support */ if (node->ps.instrument) ! InstrStopNode(node->ps.instrument, 0 /* XXX */ ); return (Node *) result; } Index: src/backend/executor/nodeBitmapIndexscan.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeBitmapIndexscan.c,v retrieving revision 1.17 diff -c -r1.17 nodeBitmapIndexscan.c *** src/backend/executor/nodeBitmapIndexscan.c 5 Mar 2006 15:58:26 -0000 1.17 --- src/backend/executor/nodeBitmapIndexscan.c 9 May 2006 20:35:07 -0000 *************** *** 112,118 **** /* must provide our own instrumentation support */ if (node->ss.ps.instrument) ! InstrStopNodeMulti(node->ss.ps.instrument, nTuples); return (Node *) tbm; } --- 112,118 ---- /* must provide our own instrumentation support */ if (node->ss.ps.instrument) ! InstrStopNode(node->ss.ps.instrument, nTuples); return (Node *) tbm; } Index: src/backend/executor/nodeBitmapOr.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeBitmapOr.c,v retrieving revision 1.5 diff -c -r1.5 nodeBitmapOr.c *** src/backend/executor/nodeBitmapOr.c 5 Mar 2006 15:58:26 -0000 1.5 --- src/backend/executor/nodeBitmapOr.c 9 May 2006 20:35:07 -0000 *************** *** 179,185 **** /* must provide our own instrumentation support */ if (node->ps.instrument) ! InstrStopNodeMulti(node->ps.instrument, 0 /* XXX */ ); return (Node *) result; } --- 179,185 ---- /* must provide our own instrumentation support */ if (node->ps.instrument) ! InstrStopNode(node->ps.instrument, 0 /* XXX */ ); return (Node *) result; } Index: src/backend/executor/nodeHash.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/executor/nodeHash.c,v retrieving revision 1.101 diff -c -r1.101 nodeHash.c *** src/backend/executor/nodeHash.c 5 Mar 2006 15:58:26 -0000 1.101 --- src/backend/executor/nodeHash.c 9 May 2006 20:35:07 -0000 *************** *** 97,103 **** /* must provide our own instrumentation support */ if (node->ps.instrument) ! InstrStopNodeMulti(node->ps.instrument, hashtable->totalTuples); /* * We do not return the hash table directly because it's not a subtype of --- 97,103 ---- /* must provide our own instrumentation support */ if (node->ps.instrument) ! InstrStopNode(node->ps.instrument, hashtable->totalTuples); /* * We do not return the hash table directly because it's not a subtype of Index: src/include/executor/instrument.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/executor/instrument.h,v retrieving revision 1.13 diff -c -r1.13 instrument.h *** src/include/executor/instrument.h 5 Mar 2006 15:58:56 -0000 1.13 --- src/include/executor/instrument.h 9 May 2006 20:35:09 -0000 *************** *** 61,77 **** instr_time counter; /* Accumulated runtime for this node */ double firsttuple; /* Time for first tuple of this cycle */ double tuplecount; /* Tuples emitted so far this cycle */ /* Accumulated statistics across all completed cycles: */ double startup; /* Total startup time (in seconds) */ double total; /* Total total time (in seconds) */ double ntuples; /* Total tuples produced */ double nloops; /* # of run cycles for this node */ } Instrumentation; extern Instrumentation *InstrAlloc(int n); extern void InstrStartNode(Instrumentation *instr); ! extern void InstrStopNode(Instrumentation *instr, bool returnedTuple); ! extern void InstrStopNodeMulti(Instrumentation *instr, double nTuples); extern void InstrEndLoop(Instrumentation *instr); #endif /* INSTRUMENT_H */ --- 61,81 ---- instr_time counter; /* Accumulated runtime for this node */ double firsttuple; /* Time for first tuple of this cycle */ double tuplecount; /* Tuples emitted so far this cycle */ + double samplecount; /* Samples collected this cycle */ /* Accumulated statistics across all completed cycles: */ double startup; /* Total startup time (in seconds) */ double total; /* Total total time (in seconds) */ double ntuples; /* Total tuples produced */ double nloops; /* # of run cycles for this node */ + double nsamples; /* # of samples taken */ + /* Tracking for sampling */ + bool sampling; /* Are we sampling this iteration */ + double nextsample; /* The next tuplecount we're going to sample */ } Instrumentation; extern Instrumentation *InstrAlloc(int n); extern void InstrStartNode(Instrumentation *instr); ! extern void InstrStopNode(Instrumentation *instr, double nTuples); extern void InstrEndLoop(Instrumentation *instr); #endif /* INSTRUMENT_H */
signature.asc
Description: Digital signature