Re: Sample rate added to pg_stat_statements

Alena Rybakina Wed, 15 Jan 2025 11:19:22 -0800

On 15.01.2025 12:47, Ilia Evdokimov wrote:

On 06.01.2025 18:57, Andrey M. Borodin wrote:
1. This code seems a little different from your patch. It is trying to avoid engaging PRNG. I'm not sure 
it's a good idea, but still. Also, it uses "<=", not "<".
     xact_is_sampled = log_xact_sample_rate != 0 &&
(log_xact_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_xact_sample_rate);
Sorry for the delayed reply. Andrey was right about thissuggestion—first, it makes the code more readable for others, andsecond, it avoids engaging the PRNG on edge values of 0.0 and 1.0.I’ve attached patch v11 with these changes.

Patch looks fine. Thank you!

On 14.01.2025 15:00, Ilia Evdokimov wrote:

Alena, Sami – I apologize for not including you in the previous email.If you're interested in this approach, I'm open to any suggestions.
[0]:https://www.postgresql.org/message-id/1b13d748-5e98-479c-9222-3253a734a038%40tantorlabs.com

This is a difficult question. I tend to agree with Alexander Korotkov'sproposal to add a filter that saves information about queries whosestatistical information satisfies the conditions of the configuredfilter. However, I don’t believe query execution time is a sufficientmetric for this purpose. It is too unstable and influenced by manyexternal factors, such as system load. For instance, various backgroundprocesses like vacuum, checkpointer, or background writer mechanismscould be running simultaneously.

Additionally, a query may take a long time to execute because anotherlarge query is consuming most of the system's resources at the sametime. In such cases, the long execution time of the triggered query maynot indicate anything remarkable. Furthermore, statistics forresource-intensive queries may appear normal if the query simplyprocessed a large volume of data, such as when it involves a Cartesianproduct or a full join.

Therefore, I think the idea of using filters is more promising,especially if multiple filters are implemented. For example, we couldadd filters for buffer usage (pages read or modified), differences incardinality (predicted vs. actual), username, application name, andother criteria. These filters would help reduce the volume of queriestracked by the pg_stat_statements extension. However, there might bechallenges in keeping this state up to date, given the volatile andunstable nature of the system.

Your proposed parameter could also reduce the volume of queries forwhich statistics need to be saved, but it is too unpredictable foranalysis. This unpredictability makes it unclear how the data would beinterpreted in the future. For instance, a query that genuinely impactedperformance might not be processed by pg_stat_statements simply due torandomness, while a small, insignificant query could be selectedinstead. Analyzing such statistical information might lead to misleadingconclusions.


--
Regards,
Alena Rybakina
Postgres Professional

Re: Sample rate added to pg_stat_statements

Reply via email to