Hi,
On 2026-04-08 21:36:48 -0700, Lukas Fittl wrote:
> On Wed, Apr 8, 2026 at 12:44 PM Andres Freund <[email protected]> wrote:
> >
> > > I half wonder if this could be a case of a Hypervisor, but not KVM or
> > > VMware, and so we fall through to the regular CPUID information (which
> > > AFAIR is different from how Linux itself handles that case, where
> > > it'll always do calibration in such cases). I think the solution might
> > > be to use the TSC calibration always on other hypervisors.
> >
> > Plausible.
> >
>
> And that is indeed the problem. Looks like we can't trust CPUID
> 0x15/0x16 when we're under a Hypervisor and its not KVM or VMware.
Why you'd report a TSC frequency but populate it with a distinct frequency
from the actual tsc is beyond me, but oh well, we gotta deal.
Pushed the fix.
> FWIW, earlier deprecated EC2 instance types that used Xen (e.g.
> m4.xlarge) just report "TSC unusable" on Windows, presumably because
> its not an invariant TSC, but I haven't dug into it since it seems
> fine to automatically use the system clock source in that case.
Yea, that's not worth investigating.
> This was without the early return, i.e. Virtualbox doesn't pass
> through CPUID even if the host has it on Intel CPUs.
Seems good.
> > What do you think about making pg_test_timing warn and return 1 if there is
> > a
> > tsc clocksource but the calibrated frequency differs by more than, idk, 10%?
> > I'm worried that there might be other problems like this lurking and we
> > wouldn't know about them unless the issue is of a similar magnitude.
>
> Yeah, that seems like a good idea. If I understand correctly you're
> thinking we could tell the user to switch to
> timing_clock_source=system in that case? (i.e. this is only a
> pg_test_timing notice, not something "smarter" in the backend itself)
I'd even just say "investigate your system an/or report a bug to postgres" :)
> Attached 0001 fixes the issue for me on my test instance, and
> presumably will fix drongo as well.
>
> 0002 is the updated version of emitting the additional debug info. I
> think this is certainly less critical to have in 19 now, but could
> still be useful if there are any future oddities.
I think we should do something, probably together with the test enhancement I
described, because otherwise we won't actually find potential breakage before
it hits production environments.
> @@ -161,10 +165,13 @@ static uint32 x86_hypervisor_tsc_frequency_khz(void);
> * 0 indicates the frequency information was not accessible via CPUID.
> */
> uint32
> -x86_tsc_frequency_khz(void)
> +x86_tsc_frequency_khz(char *source, size_t source_len)
> {
> unsigned int reg[4] = {0};
>
> + if (source)
> + strlcpy(source, "x86", source_len);
> +
> /*
> * If we're inside a virtual machine, try to fetch the TSC frequency
> from
> * the Hypervisor itself using specialized CPUID registers.
> @@ -173,7 +180,11 @@ x86_tsc_frequency_khz(void)
> * a virtual machine, as it has been observed to be wildly incorrect.
> */
> if (x86_feature_available(PG_HYPERVISOR))
> + {
> + if (source)
> + strlcat(source, ", hypervisor, cpuid 0x40000010",
> source_len);
> return x86_hypervisor_tsc_frequency_khz();
> + }
>
> /*
> * On modern Intel CPUs, the TSC is implemented by invariant
> timekeeping
Any reason you didn't include the hypervisor like in the prior version? Just
simplicity?
I think this actually ends up getting overwritten if
x86_hypervisor_tsc_frequency_khz() then "fails" to detect a frequency. Feels
like it'd be good to continue reporting that it's in a hypervisor, because
hypervisors can set tsc frequency multipliers and stuff.
What do you think about the attached incremental patch?
If I e.g. intentionally force the hypervisor path being taken, on a non-VM, I
get:
TSC frequency source: x86, hypervisor, cpuid 0x40000010, calibration
TSC frequency in use: 2497902 kHz
TSC frequency from calibration: 2497902 kHz
TSC clock source will be used by default, unless timing_clock_source is set
to 'system'.
And if rdtscp is not available:
TSC frequency source: x86, no rdtscp
TSC frequency in use: 0 kHz
TSC frequency from calibration: 2500040 kHz
TSC clock source is not usable. Likely unable to determine TSC frequency.
Are you running in an unsupported virtualized environment?
It's not perfect, but seems like it might be good enough?
Note to future self: Need to consider update the sgml docs example. Probably
just fudge it, to avoid having to update the numbers too.
Greetings,
Andres Freund
>From c9f9399c3ce0ef4136514994db90de84c08cf259 Mon Sep 17 00:00:00 2001
From: Lukas Fittl <[email protected]>
Date: Wed, 8 Apr 2026 10:11:56 -0700
Subject: [PATCH v26a 1/2] instrumentation: Show additional TSC clock source
info in pg_test_timing
In some cases its needed to understand whether TSC frequency data was
sourced from CPUID, and which of the registers. This shows this debug
information at the end of pg_test_timing, through use of a new
pg_timing_tsc_clock_source_info function, replacing the previous
pg_tsc_calibrate_frequency export that was only needed for debug info.
Author: Lukas Fittl <[email protected]>
Suggested-by: Andres Freund <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/jr4hk2sxhqcfpb67ftz5g4vw33nm67cgf7go3wwmqsafu5aclq%405m67ukuhyszz#2fdfd95b6a4a74410196999818e16cfc
---
src/include/port/pg_cpu.h | 2 +-
src/include/portability/instr_time.h | 11 ++++++-
src/common/instr_time.c | 44 +++++++++++++++++++++++--
src/port/pg_cpu_x86.c | 23 +++++++++++--
src/bin/pg_test_timing/pg_test_timing.c | 12 ++++---
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 81 insertions(+), 12 deletions(-)
diff --git a/src/include/port/pg_cpu.h b/src/include/port/pg_cpu.h
index a5d42f1b68d..db4cd62f7ef 100644
--- a/src/include/port/pg_cpu.h
+++ b/src/include/port/pg_cpu.h
@@ -56,7 +56,7 @@ x86_feature_available(X86FeatureId feature)
return X86Features[feature];
}
-extern uint32 x86_tsc_frequency_khz(void);
+extern uint32 x86_tsc_frequency_khz(char *source, size_t source_len);
#endif /* defined(USE_SSE2) || defined(__i386__) */
diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index 92558e234ac..4917728afc3 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -165,7 +165,16 @@ extern PGDLLIMPORT int32 timing_tsc_frequency_khz;
extern void pg_initialize_timing_tsc(void);
-extern uint32 pg_tsc_calibrate_frequency(void);
+typedef struct TscClockSourceInfo
+{
+ int32 frequency_khz; /* from CPUID or calibration */
+ int32 calibrated_frequency_khz; /* from calibration loop, 0 if did
+ * not converge, -1 if not yet run */
+ char frequency_source[128]; /* describes how frequency was
+ * determined */
+} TscClockSourceInfo;
+
+extern const TscClockSourceInfo *pg_timing_tsc_clock_source_info(void);
#endif /* PG_INSTR_TSC_CLOCK */
diff --git a/src/common/instr_time.c b/src/common/instr_time.c
index fc6e1852c30..2523d6a0df6 100644
--- a/src/common/instr_time.c
+++ b/src/common/instr_time.c
@@ -70,6 +70,8 @@ static void set_ticks_per_ns(void);
static void set_ticks_per_ns_system(void);
#if PG_INSTR_TSC_CLOCK
+static TscClockSourceInfo tsc_info = {.calibrated_frequency_khz = -1};
+
static bool tsc_use_by_default(void);
static void set_ticks_per_ns_for_tsc(void);
#endif
@@ -166,6 +168,7 @@ set_ticks_per_ns_system(void)
#if PG_INSTR_TSC_CLOCK
static void tsc_detect_frequency(void);
+static uint32 pg_tsc_calibrate_frequency(void);
/*
* Initialize the TSC clock source by determining its usability and frequency.
@@ -202,21 +205,36 @@ static void
tsc_detect_frequency(void)
{
timing_tsc_frequency_khz = 0;
+ tsc_info.frequency_khz = 0;
+ tsc_info.frequency_source[0] = '\0';
/* We require RDTSCP support and an invariant TSC, bail if not available */
if (!x86_feature_available(PG_RDTSCP) || !x86_feature_available(PG_TSC_INVARIANT))
return;
/* Determine speed at which the TSC advances */
- timing_tsc_frequency_khz = x86_tsc_frequency_khz();
+ timing_tsc_frequency_khz = x86_tsc_frequency_khz(tsc_info.frequency_source,
+ sizeof(tsc_info.frequency_source));
if (timing_tsc_frequency_khz > 0)
+ {
+ tsc_info.frequency_khz = timing_tsc_frequency_khz;
return;
+ }
/*
* CPUID did not give us the TSC frequency. We can instead measure the
* frequency by comparing ticks against walltime in a calibration loop.
*/
- timing_tsc_frequency_khz = pg_tsc_calibrate_frequency();
+ if (tsc_info.calibrated_frequency_khz < 0)
+ tsc_info.calibrated_frequency_khz = pg_tsc_calibrate_frequency();
+
+ timing_tsc_frequency_khz = tsc_info.calibrated_frequency_khz;
+ if (timing_tsc_frequency_khz > 0)
+ {
+ strlcpy(tsc_info.frequency_source, "x86, calibration",
+ sizeof(tsc_info.frequency_source));
+ tsc_info.frequency_khz = timing_tsc_frequency_khz;
+ }
}
/*
@@ -282,7 +300,7 @@ tsc_use_by_default(void)
#define TSC_CALIBRATION_ITERATIONS 1000000
#define TSC_CALIBRATION_SKIPS 100
#define TSC_CALIBRATION_STABLE_CYCLES 10
-uint32
+static uint32
pg_tsc_calibrate_frequency(void)
{
instr_time initial_wall;
@@ -369,4 +387,24 @@ pg_tsc_calibrate_frequency(void)
return (uint32) freq_khz;
}
+/*
+ * Returns TSC clock source information for diagnostic purposes.
+ *
+ * On first call, runs the TSC calibration loop (if not already done during
+ * frequency detection) which may take up to TSC_CALIBRATION_MAX_NS.
+ * Subsequent calls return cached results.
+ *
+ * Note: This won't return the right info in EXEC_BACKEND builds if this were
+ * used in the backend (which it currently is not), as tsc_info is not copied
+ * using read_backend_variables - only the TSC frequency is.
+ */
+const TscClockSourceInfo *
+pg_timing_tsc_clock_source_info(void)
+{
+ if (tsc_info.calibrated_frequency_khz < 0)
+ tsc_info.calibrated_frequency_khz = pg_tsc_calibrate_frequency();
+
+ return &tsc_info;
+}
+
#endif /* PG_INSTR_TSC_CLOCK */
diff --git a/src/port/pg_cpu_x86.c b/src/port/pg_cpu_x86.c
index 150b4a1d574..c097807fc7f 100644
--- a/src/port/pg_cpu_x86.c
+++ b/src/port/pg_cpu_x86.c
@@ -13,7 +13,11 @@
*-------------------------------------------------------------------------
*/
-#include "c.h"
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
#if defined(USE_SSE2) || defined(__i386__)
@@ -161,10 +165,13 @@ static uint32 x86_hypervisor_tsc_frequency_khz(void);
* 0 indicates the frequency information was not accessible via CPUID.
*/
uint32
-x86_tsc_frequency_khz(void)
+x86_tsc_frequency_khz(char *source, size_t source_len)
{
unsigned int reg[4] = {0};
+ if (source)
+ strlcpy(source, "x86", source_len);
+
/*
* If we're inside a virtual machine, try to fetch the TSC frequency from
* the hypervisor, using a hypervisor specific method.
@@ -174,7 +181,11 @@ x86_tsc_frequency_khz(void)
* to be wildly incorrect when virtualized.
*/
if (x86_feature_available(PG_HYPERVISOR))
+ {
+ if (source)
+ strlcat(source, ", hypervisor, cpuid 0x40000010", source_len);
return x86_hypervisor_tsc_frequency_khz();
+ }
/*
* On modern Intel CPUs, the TSC is implemented by invariant timekeeping
@@ -207,6 +218,9 @@ x86_tsc_frequency_khz(void)
if (reg[EAX] == 0 || reg[EBX] == 0)
return 0;
+ if (source)
+ strlcat(source, ", cpuid 0x15", source_len);
+
return reg[ECX] / 1000 * reg[EBX] / reg[EAX];
}
@@ -217,7 +231,12 @@ x86_tsc_frequency_khz(void)
*/
pg_cpuid(0x16, reg);
if (reg[EAX] > 0)
+ {
+ if (source)
+ strlcat(source, ", cpuid 0x16", source_len);
+
return reg[EAX] * 1000;
+ }
return 0;
}
diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c
index 2afb0e6a410..ef5dd90c41f 100644
--- a/src/bin/pg_test_timing/pg_test_timing.c
+++ b/src/bin/pg_test_timing/pg_test_timing.c
@@ -184,7 +184,7 @@ static void
test_tsc_timing(void)
{
uint64 loop_count;
- uint32 calibrated_freq;
+ const TscClockSourceInfo *info;
printf("\n");
loop_count = test_timing(test_duration, TIMING_CLOCK_SOURCE_TSC, false);
@@ -198,11 +198,13 @@ test_tsc_timing(void)
output(loop_count);
printf("\n");
- printf(_("TSC frequency in use: %u kHz\n"), timing_tsc_frequency_khz);
+ info = pg_timing_tsc_clock_source_info();
+ printf(_("TSC frequency in use: %d kHz\n"), info->frequency_khz);
+ if (info->frequency_source[0] != '\0')
+ printf(_("TSC frequency source: %s\n"), info->frequency_source);
- calibrated_freq = pg_tsc_calibrate_frequency();
- if (calibrated_freq > 0)
- printf(_("TSC frequency from calibration: %u kHz\n"), calibrated_freq);
+ if (info->calibrated_frequency_khz > 0)
+ printf(_("TSC frequency from calibration: %d kHz\n"), info->calibrated_frequency_khz);
else
printf(_("TSC calibration did not converge\n"));
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ea95e7984bc..2137f1e0cfe 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3090,6 +3090,7 @@ TParserStateActionItem
TQueueDestReceiver
TRGM
TSAnyCacheEntry
+TscClockSourceInfo
TSConfigCacheEntry
TSConfigInfo
TSDictInfo
--
2.53.0.1.gb2826b52eb
>From f477523f80ff41271131c430df5b1408902b8aab Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Thu, 9 Apr 2026 11:55:58 -0400
Subject: [PATCH v26a 2/2] fixup! instrumentation: Show additional TSC clock
source info in pg_test_timing
---
src/include/portability/instr_time.h | 2 ++
src/common/instr_time.c | 18 +++++++++++++++--
src/port/pg_cpu_x86.c | 3 ---
src/bin/pg_test_timing/pg_test_timing.c | 27 +++++++++++++++++--------
4 files changed, 37 insertions(+), 13 deletions(-)
diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index 4917728afc3..46307fd47a9 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -161,6 +161,7 @@ extern PGDLLIMPORT bool timing_tsc_enabled;
*/
extern PGDLLIMPORT int32 timing_tsc_frequency_khz;
+
#if PG_INSTR_TSC_CLOCK
extern void pg_initialize_timing_tsc(void);
@@ -178,6 +179,7 @@ extern const TscClockSourceInfo *pg_timing_tsc_clock_source_info(void);
#endif /* PG_INSTR_TSC_CLOCK */
+
/*
* Returns the current timing clock source effectively in use, resolving
* TIMING_CLOCK_SOURCE_AUTO to either TIMING_CLOCK_SOURCE_SYSTEM or
diff --git a/src/common/instr_time.c b/src/common/instr_time.c
index 2523d6a0df6..0db19673490 100644
--- a/src/common/instr_time.c
+++ b/src/common/instr_time.c
@@ -208,9 +208,23 @@ tsc_detect_frequency(void)
tsc_info.frequency_khz = 0;
tsc_info.frequency_source[0] = '\0';
+ strlcat(tsc_info.frequency_source, "x86",
+ sizeof(tsc_info.frequency_source));
+
/* We require RDTSCP support and an invariant TSC, bail if not available */
- if (!x86_feature_available(PG_RDTSCP) || !x86_feature_available(PG_TSC_INVARIANT))
+ if (!x86_feature_available(PG_RDTSCP))
+ {
+ strlcat(tsc_info.frequency_source, ", no rdtscp",
+ sizeof(tsc_info.frequency_source));
return;
+ }
+
+ if (!x86_feature_available(PG_TSC_INVARIANT))
+ {
+ strlcat(tsc_info.frequency_source, ", not invariant",
+ sizeof(tsc_info.frequency_source));
+ return;
+ }
/* Determine speed at which the TSC advances */
timing_tsc_frequency_khz = x86_tsc_frequency_khz(tsc_info.frequency_source,
@@ -231,7 +245,7 @@ tsc_detect_frequency(void)
timing_tsc_frequency_khz = tsc_info.calibrated_frequency_khz;
if (timing_tsc_frequency_khz > 0)
{
- strlcpy(tsc_info.frequency_source, "x86, calibration",
+ strlcat(tsc_info.frequency_source, ", calibration",
sizeof(tsc_info.frequency_source));
tsc_info.frequency_khz = timing_tsc_frequency_khz;
}
diff --git a/src/port/pg_cpu_x86.c b/src/port/pg_cpu_x86.c
index c097807fc7f..56925e9aa51 100644
--- a/src/port/pg_cpu_x86.c
+++ b/src/port/pg_cpu_x86.c
@@ -169,9 +169,6 @@ x86_tsc_frequency_khz(char *source, size_t source_len)
{
unsigned int reg[4] = {0};
- if (source)
- strlcpy(source, "x86", source_len);
-
/*
* If we're inside a virtual machine, try to fetch the TSC frequency from
* the hypervisor, using a hypervisor specific method.
diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c
index ef5dd90c41f..5cb551fa7b0 100644
--- a/src/bin/pg_test_timing/pg_test_timing.c
+++ b/src/bin/pg_test_timing/pg_test_timing.c
@@ -197,17 +197,28 @@ test_tsc_timing(void)
loop_count = test_timing(test_duration, TIMING_CLOCK_SOURCE_TSC, true);
output(loop_count);
printf("\n");
+ }
- info = pg_timing_tsc_clock_source_info();
- printf(_("TSC frequency in use: %d kHz\n"), info->frequency_khz);
- if (info->frequency_source[0] != '\0')
- printf(_("TSC frequency source: %s\n"), info->frequency_source);
+ /*
+ * Report TSC information regardless of whether it was usable, makes
+ * debugging a lot easier.
+ */
+ info = pg_timing_tsc_clock_source_info();
+ if (info->frequency_source[0] != '\0')
+ printf(_("TSC frequency source: %s\n"), info->frequency_source);
+ printf(_("TSC frequency in use: %d kHz\n"), info->frequency_khz);
- if (info->calibrated_frequency_khz > 0)
- printf(_("TSC frequency from calibration: %d kHz\n"), info->calibrated_frequency_khz);
- else
- printf(_("TSC calibration did not converge\n"));
+ if (info->calibrated_frequency_khz > 0)
+ printf(_("TSC frequency from calibration: %d kHz\n"), info->calibrated_frequency_khz);
+ else
+ printf(_("TSC calibration did not converge\n"));
+ /*
+ * Report whether TSC was usable and, if so, whether it will be used
+ * automatically.
+ */
+ if (loop_count > 0)
+ {
pg_set_timing_clock_source(TIMING_CLOCK_SOURCE_AUTO);
if (pg_current_timing_clock_source() == TIMING_CLOCK_SOURCE_TSC)
printf(_("TSC clock source will be used by default, unless timing_clock_source is set to 'system'.\n"));
--
2.53.0.1.gb2826b52eb