Hi, On 2023-01-13 09:11:06 +0100, David Geier wrote: > Mostly I'm wondering if the sampling based approach gains us enough to be > worth it, once the patch to use RDTSC hopefully landed (see [1]).
Well, I'm not sure we have a path forward on it. There's portability and accuracy concerns. But more importantly: > I believe that with the RDTSC patch the overhead of TIMING ON is lower than > the overhead of using ANALYZE with TIMING OFF in the first place. Hence, to > be really useful, it would be great if we could on top of TIMING SAMPLING > also lower the overhead of ANALYZE itself further (e.g. by using a fast path > for the default EXPLAIN (ANALYZE, TIMING ON / SAMPLING)). Currently, > InstrStartNode() and InstrStopNode() have a ton of branches and without all > the typically deactivated code the implementation would be very small and > could be placed in an inlinable function. Yes, I think SAMPLING could get rid of most of the instrumentation overhead - at the cost of a bit less detail in the explain, of course. Which could make it a lot more feasible to enable something like auto_explain.log_timing in busy workloads. For the sampling mode we don't really need something like InstrStart/StopNode. We just need a pointer to node currently executing - not free to set, but still a heck of a lot cheaper than InstrStopNode(), even without ->need_timer etc. Then the timer just needs to do instr->sampled_total += (now - last_sample) last_sample = now I've been thinking that we should consider making more of the instrumentation code work like that. The amount of work we're doing in InstrStart/StopNode() has steadily risen. When buffer usage and WAL usage are enabled, we're executing over 1000 instructions! And each single Instrumentation node is ~450 bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs embedded. If we instead have InstrStartNode() set up a global pointer to the Instrumentation node, we can make the instrumentation code modify both the "global" counters (pgBufferUsage, pgWalUsage) and, if set, current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger changes - right now nodes "automatically" include the IO/WAL incurred in child nodes, but that's just a small bit of additional summin-up to be done during EXPLAIN. Separately, I think we should consider re-ordering Instrumentation so that bufusage_start, walusage_start are after the much more commonly used elements. We're forcing ntuples, nloops, .. onto separate cachelines, even though they're accounted for unconditionally. Greetings, Andres Freund