Hi, On 2023-01-17 08:46:12 -0500, Robert Haas wrote: > On Fri, Jan 13, 2023 at 2:56 PM Andres Freund <and...@anarazel.de> wrote: > > Does anybody see a reason to not move forward with this aspect? We do a fair > > amount of INSTR_TIME_ACCUM_DIFF() etc, and that gets a good bit cheaper by > > just using nanoseconds. We'd also save memory in BufferUsage (144-122 > > bytes), > > Instrumentation (16 bytes saved in Instrumentation itself, 32 via > > BufferUsage).
Here's an updated version of the move to representing instr_time as nanoseconds. It's now split into a few patches: 0001) Add INSTR_TIME_SET_ZERO() calls where otherwise 0002 causes gcc to warn Alternatively we can decide to deprecate INSTR_TIME_SET_ZERO() and just allow to assign 0. 0002) Convert instr_time to uint64 This is the cleaned up version of the prior patch. The main change is that it deduplicated a lot of the code between the architectures. 0003) Add INSTR_TIME_SET_SECOND() This is used in 0004. Just allows setting an instr_time to a time in seconds, allowing for a cheaper loop exit condition in 0004. 0004) report nanoseconds in pg_test_timing I also couldn't help and hacked a bit on the rdtsc pieces. I did figure out how to do the cycles->nanosecond conversion with integer shift and multiply in the common case, which does show a noticable speedup. But that's for another day. I fought a bit with myself about whether to send those patches in this thread, because it'll take over the CF entry. But decided that it's ok, given that David's patches should be rebased over these anyway? > I read through 0001 and it seems basically fine to me. Comments: > > 1. pg_clock_gettime_ns() doesn't follow pgindent conventions. Fixed. > 2. I'm not entirely sure that the new .?S_PER_.?S macros are > worthwhile but maybe they are, and in any case I don't care very much. There's now fewer. But those I'd like to keep. I just end up counting digits manually way too many times. > 3. I've always found 'struct timespec' to be pretty annoying > notationally, so I like the fact that this patch would reduce use of > it. Same. Greetings, Andres Freund
>From c1024a9dfa7f5645200b7fa68e8bce5561c9cee0 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 16 Jan 2023 10:04:42 -0800 Subject: [PATCH v7 1/4] Zero initialize instr_time uses causing compiler warnings These are all not necessary from a correctness POV. However, in a subsequent patch instr_time will be simplified to an int64, at which point gcc would otherwise start to warn about the changed places. Author: Reviewed-by: Discussion: https://postgr.es/m/20230116023639.rn36vf6ajqmfc...@awork3.anarazel.de Backpatch: --- src/backend/access/transam/xlog.c | 4 ++++ src/backend/storage/buffer/bufmgr.c | 4 ++++ src/backend/storage/file/buffile.c | 4 ++++ src/bin/psql/common.c | 6 ++++++ 4 files changed, 18 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8f47fb75700..7d65b1d4159 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2191,6 +2191,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) /* Measure I/O timing to write WAL data */ if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); + else + INSTR_TIME_SET_ZERO(start); pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE); written = pg_pwrite(openLogFile, from, nleft, startoffset); @@ -8150,6 +8152,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) /* Measure I/O timing to sync the WAL file */ if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); + else + INSTR_TIME_SET_ZERO(start); pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC); switch (sync_method) diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 8075828e8a6..800a4248c95 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1017,6 +1017,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, if (track_io_timing) INSTR_TIME_SET_CURRENT(io_start); + else + INSTR_TIME_SET_ZERO(io_start); smgrread(smgr, forkNum, blockNum, (char *) bufBlock); @@ -2902,6 +2904,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln) if (track_io_timing) INSTR_TIME_SET_CURRENT(io_start); + else + INSTR_TIME_SET_ZERO(io_start); /* * bufToWrite is either the shared buffer or a copy, as appropriate. diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index c5464b6aa62..0a51624df3b 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -446,6 +446,8 @@ BufFileLoadBuffer(BufFile *file) if (track_io_timing) INSTR_TIME_SET_CURRENT(io_start); + else + INSTR_TIME_SET_ZERO(io_start); /* * Read whatever we can get, up to a full bufferload. @@ -525,6 +527,8 @@ BufFileDumpBuffer(BufFile *file) if (track_io_timing) INSTR_TIME_SET_CURRENT(io_start); + else + INSTR_TIME_SET_ZERO(io_start); bytestowrite = FileWrite(thisfile, file->buffer.data + wpos, diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 00627830c47..f907f5d4e8d 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -1276,6 +1276,8 @@ DescribeQuery(const char *query, double *elapsed_msec) if (timing) INSTR_TIME_SET_CURRENT(before); + else + INSTR_TIME_SET_ZERO(before); /* * To parse the query but not execute it, we prepare it, using the unnamed @@ -1406,6 +1408,8 @@ ExecQueryAndProcessResults(const char *query, if (timing) INSTR_TIME_SET_CURRENT(before); + else + INSTR_TIME_SET_ZERO(before); if (pset.bind_flag) success = PQsendQueryParams(pset.db, query, pset.bind_nparams, NULL, (const char * const *) pset.bind_params, NULL, NULL, 0); @@ -1702,6 +1706,8 @@ ExecQueryUsingCursor(const char *query, double *elapsed_msec) if (timing) INSTR_TIME_SET_CURRENT(before); + else + INSTR_TIME_SET_ZERO(before); /* if we're not in a transaction, start one */ if (PQtransactionStatus(pset.db) == PQTRANS_IDLE) -- 2.38.0
>From d76727601cff3885988969080b4372e7449478d7 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 16 Jan 2023 20:38:12 -0800 Subject: [PATCH v7 2/4] Use int64 to represent instr_time on all platforms Until now we used struct timespec on all platforms but windows. That causes a fair bit of memory (struct timeval is 16 bytes) and runtime overhead (much more complicated additions). Instead we can convert the time to nanoseconds in INSTR_TIME_SET_CURRENT(), making the remaining operations cheaper. Representing time as int64 nanoseconds provides sufficient range, ~292 years relative to a starting point (depending on clock source, relative to the unix epoch or the system's boot time). That'd not be sufficient for calendar time stored on disk, but is plenty for runtime interval time measurement. On windows instr_time already is represented as cycles. It might make sense to represent time as cycles on other platforms as well, as using cycle acquisition instructions like rdtsc directly can reduce the overhead of time acquisition substantially. This could be done in a fairly localized manner as the code stands after this commit. Because the windows and non-windows paths are now more similar, use a common set of macros. To make that possible, most of the use of LARGE_INTEGER had to be removed, which looks nicer anyway. Author: Andres Freund <and...@anarazel.de> Author: Lukas Fittl <lu...@fittl.com> Author: David Geier <geidav...@gmail.com> Reviewed-by: Discussion: https://postgr.es/m/20230113195547.k4nlrmawpijqw...@awork3.anarazel.de --- src/include/portability/instr_time.h | 149 +++++++++++++-------------- 1 file changed, 73 insertions(+), 76 deletions(-) diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h index 9ea1a68bd94..185be8d5cad 100644 --- a/src/include/portability/instr_time.h +++ b/src/include/portability/instr_time.h @@ -34,6 +34,8 @@ * * INSTR_TIME_GET_MICROSEC(t) convert t to uint64 (in microseconds) * + * INSTR_TIME_GET_NANOSEC(t) convert t to uint64 (in nanoseconds) + * * Note that INSTR_TIME_SUBTRACT and INSTR_TIME_ACCUM_DIFF convert * absolute times to intervals. The INSTR_TIME_GET_xxx operations are * only useful on intervals. @@ -54,8 +56,26 @@ #ifndef INSTR_TIME_H #define INSTR_TIME_H + +/* + * On all platforms we store time using a 64bit integer - they're cheap to + * add/subtract, the most common operations. The acquisition of time + * and converting to specific units of time is platform specific. + */ + +typedef int64 instr_time; + + +/* helpers macros used in platform specific code below */ + +#define NS_PER_S INT64CONST(1000000000) +#define NS_PER_MS INT64CONST(1000000) +#define NS_PER_US INT64CONST(1000) + + #ifndef WIN32 + /* Use clock_gettime() */ #include <time.h> @@ -80,93 +100,39 @@ #define PG_INSTR_CLOCK CLOCK_REALTIME #endif -typedef struct timespec instr_time; +/* helper for INSTR_TIME_SET_CURRENT */ +static inline instr_time +pg_clock_gettime_ns(void) +{ + struct timespec tmp; -#define INSTR_TIME_IS_ZERO(t) ((t).tv_nsec == 0 && (t).tv_sec == 0) + clock_gettime(PG_INSTR_CLOCK, &tmp); -#define INSTR_TIME_SET_ZERO(t) ((t).tv_sec = 0, (t).tv_nsec = 0) + return tmp.tv_sec * NS_PER_S + tmp.tv_nsec; +} -#define INSTR_TIME_SET_CURRENT(t) ((void) clock_gettime(PG_INSTR_CLOCK, &(t))) +#define INSTR_TIME_SET_CURRENT(t) \ + ((t) = pg_clock_gettime_ns()) -#define INSTR_TIME_ADD(x,y) \ - do { \ - (x).tv_sec += (y).tv_sec; \ - (x).tv_nsec += (y).tv_nsec; \ - /* Normalize */ \ - while ((x).tv_nsec >= 1000000000) \ - { \ - (x).tv_nsec -= 1000000000; \ - (x).tv_sec++; \ - } \ - } while (0) +#define INSTR_TIME_GET_NANOSEC(t) \ + ((uint64) (t)) -#define INSTR_TIME_SUBTRACT(x,y) \ - do { \ - (x).tv_sec -= (y).tv_sec; \ - (x).tv_nsec -= (y).tv_nsec; \ - /* Normalize */ \ - while ((x).tv_nsec < 0) \ - { \ - (x).tv_nsec += 1000000000; \ - (x).tv_sec--; \ - } \ - } while (0) - -#define INSTR_TIME_ACCUM_DIFF(x,y,z) \ - do { \ - (x).tv_sec += (y).tv_sec - (z).tv_sec; \ - (x).tv_nsec += (y).tv_nsec - (z).tv_nsec; \ - /* Normalize after each add to avoid overflow/underflow of tv_nsec */ \ - while ((x).tv_nsec < 0) \ - { \ - (x).tv_nsec += 1000000000; \ - (x).tv_sec--; \ - } \ - while ((x).tv_nsec >= 1000000000) \ - { \ - (x).tv_nsec -= 1000000000; \ - (x).tv_sec++; \ - } \ - } while (0) - -#define INSTR_TIME_GET_DOUBLE(t) \ - (((double) (t).tv_sec) + ((double) (t).tv_nsec) / 1000000000.0) - -#define INSTR_TIME_GET_MILLISEC(t) \ - (((double) (t).tv_sec * 1000.0) + ((double) (t).tv_nsec) / 1000000.0) - -#define INSTR_TIME_GET_MICROSEC(t) \ - (((uint64) (t).tv_sec * (uint64) 1000000) + (uint64) ((t).tv_nsec / 1000)) #else /* WIN32 */ + /* Use QueryPerformanceCounter() */ -typedef LARGE_INTEGER instr_time; +/* helper for INSTR_TIME_SET_CURRENT */ +static inline instr_time +pg_query_performance_counter(void) +{ + LARGE_INTEGER t; -#define INSTR_TIME_IS_ZERO(t) ((t).QuadPart == 0) + QueryPerformanceCounter(&t); -#define INSTR_TIME_SET_ZERO(t) ((t).QuadPart = 0) - -#define INSTR_TIME_SET_CURRENT(t) QueryPerformanceCounter(&(t)) - -#define INSTR_TIME_ADD(x,y) \ - ((x).QuadPart += (y).QuadPart) - -#define INSTR_TIME_SUBTRACT(x,y) \ - ((x).QuadPart -= (y).QuadPart) - -#define INSTR_TIME_ACCUM_DIFF(x,y,z) \ - ((x).QuadPart += (y).QuadPart - (z).QuadPart) - -#define INSTR_TIME_GET_DOUBLE(t) \ - (((double) (t).QuadPart) / GetTimerFrequency()) - -#define INSTR_TIME_GET_MILLISEC(t) \ - (((double) (t).QuadPart * 1000.0) / GetTimerFrequency()) - -#define INSTR_TIME_GET_MICROSEC(t) \ - ((uint64) (((double) (t).QuadPart * 1000000.0) / GetTimerFrequency())) + return t.QuadPart; +} static inline double GetTimerFrequency(void) @@ -177,11 +143,42 @@ GetTimerFrequency(void) return (double) f.QuadPart; } +#define INSTR_TIME_SET_CURRENT(t) \ + ((t) = pg_query_performance_counter()) + +#define INSTR_TIME_GET_NANOSEC(t) \ + ((uint64) (((t) * NS_PER_S) / GetTimerFrequency())) + #endif /* WIN32 */ -/* same macro on all platforms */ + +/* + * Common macros + */ + +#define INSTR_TIME_IS_ZERO(t) ((t) == 0) + +#define INSTR_TIME_SET_ZERO(t) ((t) = 0) #define INSTR_TIME_SET_CURRENT_LAZY(t) \ (INSTR_TIME_IS_ZERO(t) ? INSTR_TIME_SET_CURRENT(t), true : false) +#define INSTR_TIME_GET_DOUBLE(t) \ + ((double) INSTR_TIME_GET_NANOSEC(t) / NS_PER_S) + +#define INSTR_TIME_GET_MILLISEC(t) \ + ((double) INSTR_TIME_GET_NANOSEC(t) / NS_PER_MS) + +#define INSTR_TIME_GET_MICROSEC(t) \ + (INSTR_TIME_GET_NANOSEC(t) / NS_PER_US) + +#define INSTR_TIME_ADD(x,y) \ + ((x) += (y)) + +#define INSTR_TIME_SUBTRACT(x,y) \ + ((x) -= (y)) + +#define INSTR_TIME_ACCUM_DIFF(x,y,z) \ + ((x) += (y) - (z)) + #endif /* INSTR_TIME_H */ -- 2.38.0
>From 3ed50650be12bb4da1baab3936e5e4f6346ef5b3 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 16 Jan 2023 11:06:36 -0800 Subject: [PATCH v7 3/4] instr_time: Add INSTR_TIME_SET_SECOND() Useful to calculate the end of a time-bound loop without having to convert into time units (which is costly). Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- src/include/portability/instr_time.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h index 185be8d5cad..91492c62a39 100644 --- a/src/include/portability/instr_time.h +++ b/src/include/portability/instr_time.h @@ -22,6 +22,8 @@ * INSTR_TIME_SET_CURRENT_LAZY(t) set t to current time if t is zero, * evaluates to whether t changed * + * INSTR_TIME_SET_SECOND(t, s) set t to s seconds + * * INSTR_TIME_ADD(x, y) x += y * * INSTR_TIME_SUBTRACT(x, y) x -= y @@ -114,6 +116,9 @@ pg_clock_gettime_ns(void) #define INSTR_TIME_SET_CURRENT(t) \ ((t) = pg_clock_gettime_ns()) +#define INSTR_TIME_SET_SECONDS(t, s) \ + ((t) = NS_PER_S * (s)) + #define INSTR_TIME_GET_NANOSEC(t) \ ((uint64) (t)) @@ -146,6 +151,9 @@ GetTimerFrequency(void) #define INSTR_TIME_SET_CURRENT(t) \ ((t) = pg_query_performance_counter()) +#define INSTR_TIME_SET_SECONDS(t, s) \ + ((t) = s * GetTimerFrequency()) + #define INSTR_TIME_GET_NANOSEC(t) \ ((uint64) (((t) * NS_PER_S) / GetTimerFrequency())) -- 2.38.0
>From 8d320cb8d0a6e35280134fa927c0297e407bf444 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 16 Jan 2023 11:19:11 -0800 Subject: [PATCH v7 4/4] wip: report nanoseconds in pg_test_timing This commit also updates pg_test_timing's documentation: - compare EXPLAIN (ANALYZE, TIMING ON/OFF) instead of comparing performance of of statement with/without EXPLAIN ANALYZE - explain the 2x overhead (due to two timestamp acquisitions per row) - remove old section about old versions of linux - I couldn't update the numbers, and it's old enough nobody would care --- src/bin/pg_test_timing/pg_test_timing.c | 72 ++++++++------ doc/src/sgml/ref/pgtesttiming.sgml | 122 +++++++++++------------- 2 files changed, 99 insertions(+), 95 deletions(-) diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c index c29d6f87629..bae10ce6fa4 100644 --- a/src/bin/pg_test_timing/pg_test_timing.c +++ b/src/bin/pg_test_timing/pg_test_timing.c @@ -19,8 +19,8 @@ static void handle_args(int argc, char *argv[]); static uint64 test_timing(unsigned int duration); static void output(uint64 loop_count); -/* record duration in powers of 2 microseconds */ -long long int histogram[32]; +/* record duration in powers of 2 nanoseconds */ +uint64 histogram[64]; int main(int argc, char *argv[]) @@ -122,34 +122,47 @@ static uint64 test_timing(unsigned int duration) { uint64 total_time; - int64 time_elapsed = 0; uint64 loop_count = 0; - uint64 prev, - cur; + instr_time cur; instr_time start_time, - end_time, - temp; - - total_time = duration > 0 ? duration * INT64CONST(1000000) : 0; + end_time; + instr_time until_time; INSTR_TIME_SET_CURRENT(start_time); - cur = INSTR_TIME_GET_MICROSEC(start_time); - while (time_elapsed < total_time) + /* + * To reduce loop overhead, check loop condition in instr_time domain. + */ + INSTR_TIME_SET_SECONDS(total_time, duration); + until_time = start_time; + INSTR_TIME_ADD(until_time, total_time); + + cur = start_time; + + while (cur < until_time) { - int32 diff, - bits = 0; + instr_time temp; + instr_time prev; + int64 diff; + int32 bits = 0; prev = cur; - INSTR_TIME_SET_CURRENT(temp); - cur = INSTR_TIME_GET_MICROSEC(temp); - diff = cur - prev; + INSTR_TIME_SET_CURRENT(cur); + temp = cur; + INSTR_TIME_SUBTRACT(temp, prev); + diff = INSTR_TIME_GET_NANOSEC(temp); /* Did time go backwards? */ - if (diff < 0) + if (unlikely(diff <= 0)) { + /* can't do anything with that measurement */ + if (diff == 0) + { + loop_count++; + continue; + } fprintf(stderr, _("Detected clock going backwards in time.\n")); - fprintf(stderr, _("Time warp: %d ms\n"), diff); + fprintf(stderr, _("Time warp: %lld ns\n"), (long long) diff); exit(1); } @@ -164,8 +177,6 @@ test_timing(unsigned int duration) histogram[bits]++; loop_count++; - INSTR_TIME_SUBTRACT(temp, start_time); - time_elapsed = INSTR_TIME_GET_MICROSEC(temp); } INSTR_TIME_SET_CURRENT(end_time); @@ -173,7 +184,7 @@ test_timing(unsigned int duration) INSTR_TIME_SUBTRACT(end_time, start_time); printf(_("Per loop time including overhead: %0.2f ns\n"), - INSTR_TIME_GET_DOUBLE(end_time) * 1e9 / loop_count); + (INSTR_TIME_GET_DOUBLE(end_time) * NS_PER_S) / loop_count); return loop_count; } @@ -181,9 +192,10 @@ test_timing(unsigned int duration) static void output(uint64 loop_count) { - int64 max_bit = 31, + int64 low_bit = 0, + max_bit = 63, i; - char *header1 = _("< us"); + char *header1 = _("< ns"); char *header2 = /* xgettext:no-c-format */ _("% of total"); char *header3 = _("count"); int len1 = strlen(header1); @@ -194,15 +206,19 @@ output(uint64 loop_count) while (max_bit > 0 && histogram[max_bit] == 0) max_bit--; + /* find lowest bit value */ + while (low_bit < max_bit && histogram[low_bit] == 0) + low_bit++; + printf(_("Histogram of timing durations:\n")); printf("%*s %*s %*s\n", - Max(6, len1), header1, + Max(9, len1), header1, Max(10, len2), header2, Max(10, len3), header3); - for (i = 0; i <= max_bit; i++) - printf("%*ld %*.5f %*lld\n", - Max(6, len1), 1l << i, + for (i = low_bit; i <= max_bit; i++) + printf("%*ld %*.5f %*llu\n", + Max(9, len1), 1l << i, Max(10, len2) - 1, (double) histogram[i] * 100 / loop_count, - Max(10, len3), histogram[i]); + Max(10, len3), (long long unsigned) histogram[i]); } diff --git a/doc/src/sgml/ref/pgtesttiming.sgml b/doc/src/sgml/ref/pgtesttiming.sgml index a5eb3aa25e0..82c625e3a2a 100644 --- a/doc/src/sgml/ref/pgtesttiming.sgml +++ b/doc/src/sgml/ref/pgtesttiming.sgml @@ -93,28 +93,34 @@ PostgreSQL documentation <para> Good results will show most (>90%) individual timing calls take less than - one microsecond. Average per loop overhead will be even lower, below 100 - nanoseconds. This example from an Intel i7-860 system using a TSC clock - source shows excellent performance: + one microsecond (1000 nanoseconds). Average per loop overhead will be even + lower, below 100 nanoseconds. This example from an Intel i9-9880H system + using a TSC clock source shows excellent performance: <screen><![CDATA[ Testing timing overhead for 3 seconds. -Per loop time including overhead: 35.96 ns +Per loop time including overhead: 13.74 ns Histogram of timing durations: - < us % of total count - 1 96.40465 80435604 - 2 3.59518 2999652 - 4 0.00015 126 - 8 0.00002 13 - 16 0.00000 2 + < ns % of total count + 16 97.00221 211857215 + 32 2.99555 6542412 + 64 0.00115 2505 + 128 0.00035 759 + 256 0.00004 78 + 512 0.00000 3 + 1024 0.00000 4 + 2048 0.00034 732 + 4096 0.00000 6 + 8192 0.00000 8 + 16384 0.00019 409 + 32768 0.00018 403 + 65536 0.00000 1 ]]></screen> </para> <para> - Note that different units are used for the per loop time than the - histogram. The loop can have resolution within a few nanoseconds (ns), - while the individual timing calls can only resolve down to one microsecond - (us). + Note that the accuracy of the histogram entries may be lower than the + per loop time. </para> </refsect2> @@ -125,24 +131,30 @@ Histogram of timing durations: When the query executor is running a statement using <command>EXPLAIN ANALYZE</command>, individual operations are timed as well as showing a summary. The overhead of your system can be checked by - counting rows with the <application>psql</application> program: + disabling the per-row timing, using the <literal>TIMING OFF</literal> + option: <screen> -CREATE TABLE t AS SELECT * FROM generate_series(1,100000); -\timing -SELECT COUNT(*) FROM t; -EXPLAIN ANALYZE SELECT COUNT(*) FROM t; +CREATE TABLE t AS SELECT * FROM generate_series(1, 100000); +EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t; +EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t; </screen> </para> <para> - The i7-860 system measured runs the count query in 9.8 ms while - the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each - processing just over 100,000 rows. That 6.8 ms difference means the timing - overhead per row is 68 ns, about twice what pg_test_timing estimated it - would be. Even that relatively small amount of overhead is making the fully - timed count statement take almost 70% longer. On more substantial queries, - the timing overhead would be less problematic. + The i9-9880H system measured shows an execution time of 4.116 ms for the + <literal>TIMING OFF</literal> query, and 6.965 ms for the + <literal>TIMING ON</literal>, each processing 100,000 rows. + + That 2.849 ms difference means the timing overhead per row is 28 ns. As + <literal>TIMING ON</literal> measures timestamps twice per row returned by + an executor node, the overhead is very close to what pg_test_timing + estimated it would be. + + more than what pg_test_timing estimated it would be. Even that relatively + small amount of overhead is making the fully timed count statement take + about 60% longer. On more substantial queries, the timing overhead would + be less problematic. </para> </refsect2> @@ -157,28 +169,31 @@ EXPLAIN ANALYZE SELECT COUNT(*) FROM t; <screen><![CDATA[ # cat /sys/devices/system/clocksource/clocksource0/available_clocksource -tsc hpet acpi_pm +tsc acpi_pm # echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource # pg_test_timing -Per loop time including overhead: 722.92 ns +Testing timing overhead for 3 seconds. +Per loop time including overhead: 708.58 ns Histogram of timing durations: - < us % of total count - 1 27.84870 1155682 - 2 72.05956 2990371 - 4 0.07810 3241 - 8 0.01357 563 - 16 0.00007 3 + < ns % of total count + 1024 99.79796 4225270 + 2048 0.15560 6588 + 4096 0.00035 15 + 8192 0.01738 736 + 16384 0.01679 711 + 32768 0.01190 504 ]]></screen> </para> <para> In this configuration, the sample <command>EXPLAIN ANALYZE</command> above - takes 115.9 ms. That's 1061 ns of timing overhead, again a small multiple - of what's measured directly by this utility. That much timing overhead - means the actual query itself is only taking a tiny fraction of the - accounted for time, most of it is being consumed in overhead instead. In - this configuration, any <command>EXPLAIN ANALYZE</command> totals involving - many timed operations would be inflated significantly by timing overhead. + shows an execution time of 148.7 ms. That's 1392 ns of timing overhead, + again a small multiple of what's measured directly by this utility. That + much timing overhead means the actual query itself is only taking a tiny + fraction of the accounted for time, most of it is being consumed in + overhead instead. In this configuration, any <command>EXPLAIN + ANALYZE</command> totals involving many timed operations would be inflated + significantly by timing overhead. </para> <para> @@ -196,33 +211,6 @@ kern.timecounter.hardware: ACPI-fast -> TSC </screen> </para> - <para> - Other systems may only allow setting the time source on boot. On older - Linux systems the "clock" kernel setting is the only way to make this sort - of change. And even on some more recent ones, the only option you'll see - for a clock source is "jiffies". Jiffies are the older Linux software clock - implementation, which can have good resolution when it's backed by fast - enough timing hardware, as in this example: - -<screen><![CDATA[ -$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource -jiffies -$ dmesg | grep time.c -time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer. -time.c: Detected 2400.153 MHz processor. -$ pg_test_timing -Testing timing overhead for 3 seconds. -Per timing duration including loop overhead: 97.75 ns -Histogram of timing durations: - < us % of total count - 1 90.23734 27694571 - 2 9.75277 2993204 - 4 0.00981 3010 - 8 0.00007 22 - 16 0.00000 1 - 32 0.00000 1 -]]></screen></para> - </refsect2> <refsect2> -- 2.38.0